Protect health data and propel clinical research

Expertly blending risk assessment and synthetic data to meet privacy standards, enhance compliance, and fuel clinical innovation.

Synthetic data as a Privacy Enhancing Technology (PET)

Accelerate Innovation. Share data confidently and cost-effectively inside your organization or across jurisdictions.


Synthetic data aids in overcoming privacy hurdles and boosting innovation, and is thus considered a privacy-enhancing technology (PET).


Our methodology produces synthetic data that are not considered personal information, making them suitable to use for secondary purposes like sharing health data externally, gaining analysis insights, software testing, and training. Our team of data scientists works with AI and machine learning to create synthetic datasets from real-world data (RWD), randomized controlled trials (RCT), and other health data sources. Comprehensive privacy and utility metrics provide objective assessments of risk and data quality.

With the help of machine learning or deep learning models, synthetic data with the same patterns and statistical properties can be generated from real data, yet synthetic data is not real patient data.

The generative AI technology previously known as Replica Synthesis is now Aetion® Generate.

Protect, share, reuse, amplify, and augment your sensitive data.

Advancing clinical research

Overcome low recruitment in clinical trials. Achieve bias correction or cohort amplification to supplement under-represented data and produce evidence. Low patient recruitment leads to small or incomplete datasets (e.g., rare diseases, pediatric populations, and underrepresented populations), which can impact the efficacy of clinical trials. Our methodologies and underlying technology enable different types of health data enhancement, such as amplification, augmentation, and simulating virtual patients.


Synthesize clinical trial data to boost sample size and power calculations.


Augment clinical trial data with synthetic data to predict the treatment effect estimated in an incomplete trial.

Imbalance / bias correction

Boost the size of underrepresented populations and subgroups with synthetic data to correct for imbalance or biased data.

Privacy and risk assessment

Reusing health data—whether it’s structured or unstructured, from clinical trials or various sources—demands stringent re-identification risk management. Our approach? A powerful blend of the latest risk assessment strategies, spearheaded by our proprietary Cardinal Methodology and cutting-edge technologies. Generate empowers data providers to meet critical data privacy requirements such as:

Our data science experts conduct thorough risk assessments, offer robust mitigation strategies, and provide detailed documentation to back every decision. With us, your data’s integrity is uncompromised, ensuring you stay ahead in the landscape of data privacy.

Privacy and risk assessment process, resulting in the delivery of a de-identified dataset and privacy risk report.

Related Resources

Synthetic Data Generation for Implementing the HIPAA Expert Determination Method

In this blog post, we challenge the efficacy of synthetic data generation (SDG) as a tool to meet the requirements of HIPAA’s expert determination method.

Generating and Evaluating Synthetic Clinical Trial Data in a Pharmaceutical Company

We provide an overview of the generation of synthetic clinical trial data at Novartis, focusing on use cases, and the utility and privacy outcomes of evaluating these datasets.

A review of the new ISO Standard on Data De-identification: ISO/IEC 27559

Khaled El Emam’s webinar outlines the ISO standard, assesses its strengths and weaknesses, and suggests how to adapt and implement it.