Published on Oct. 15, 2025

Data Preparation & Privacy Toolkit Intern

NTT DATA

About the job

Make an impact with NTT DATA
Join a company that is pushing the boundaries of what is possible. We are renowned for our technical excellence and leading innovations, and for making a difference to our clients and society. Our workplace embraces diversity and inclusion – it’s a place where you can grow, belong and thrive.

Topic:

Design and implement a Data Preparation & Privacy Toolkit for use at the SPE boundary. The toolkit will:

  • Ingest and export FHIR resources using an open-source server (e.g., HAPI FHIR).
  • Apply de-identification and anonymization techniques on structured and unstructured fields (using tools such as Presidio and ARX).
  • Generate synthetic datasets for non-sensitive pipeline testing (using SDV).
  • Document repeatable, Infrastructure-as-Code style steps suitable for DS4L’s Product Phase.


The goal is to provide DS4L with a standardized, safe, and reproducible method to prepare health data before it enters any connector or dataspace workflow.

Your role will consist in:

  • Setting up and testing a local HAPI FHIR server with sample resources.
  • Implementing PII detection/redaction in free-text fields (Presidio).
  • Applying tabular anonymization and risk metrics (ARX).
  • Exploring synthetic data generation for testing and demo scenarios (SDV).
  • Creating a policy pack with anonymization presets and PII detection rules.
  • Writing a runbook and evidence report: how to install, operate, audit, and apply the toolkit; results of de-identification tests; recommended configurations.


During the internship, you will receive the support of a senior engineer as well as the team leader to guide you while defining the approach and milestones. You will also have the opportunity to participate in all the team meetings, which will enable you to be immersed in the professional world and to be faced with day-to-day challenges encountered in the work.


Prerequisites
:

  • Proficiency in Python (pandas, scripting).
  • Basic Docker/Linux/Git skills.
  • Understanding of data handling and security best practices.
  • Ability to read and work with FHIR resources.
  • Strong interest in privacy engineering and reproducibility.


Good to have:

  • Knowledge of data anonymization frameworks (e.g., ARX).
  • Experience with synthetic data generation (SDV or similar).
  • Familiarity with Kubernetes or Infrastructure-as-Code approaches.
  • Basic statistics/ML to interpret utility vs. privacy trade-offs.
  • Awareness of OMOP CDM or health data standards beyond FHIR.


Soft skills:

  • Autonomy
  • Organization / Planning
  • Collaboration

Workplace type:

On-site Working

About NTT DATA
NTT DATA is a $30+ billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long-term success. We invest over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure, and connectivity. We are also one of the leading providers of digital and AI infrastructure in the world. NTT DATA is part of NTT Group and headquartered in Tokyo.

Equal Opportunity Employer
NTT DATA is proud to be an Equal Opportunity Employer with a global culture that embraces diversity. We are committed to providing an environment free of unfair discrimination and harassment. We do not discriminate based on age, race, colour, gender, sexual orientation, religion, nationality, disability, pregnancy, marital status, veteran status, or any other protected category. Join our growing global team and accelerate your career with us. Apply today.

Third parties fraudulently posing as NTT DATA recruiters

NTT DATA recruiters will never ask job seekers or candidates for payment or banking information during the recruitment process, for any reason. Please remain vigilant of third parties who may attempt to impersonate NTT DATA recruiters—whether in writing or by phone—in order to deceptively obtain personal data or money from you. All email communications from an NTT DATA recruiter will come from an @nttdata.com email address. If you suspect any fraudulent activity, please contact us.

Apply Now

Required skills

Data Analysis Linux Python Kubernetes Docker Git Machine Learning Statistics Data Handling Security Best Practices FHIR Data Anonymization Synthetic Data Generation Infrastructure-as-Code HAPI FHIR Presidio ARX SDV Privacy Engineering Health Data Standards OMOP CDM