Alex Watson

How to Generate Synthetic Data: Tools and Techniques to Create Interchangeable Datasets
Synthetic data is algorithmically generated data that mirrors the statistical properties of the dataset it’s based on. Learn how to make high-quality synthetic data.
Read more...
2025: The Year Synthetic Data Goes Mainstream
How synthetic data is transforming enterprise AI in 2025 by addressing privacy, fine-tuning, and scaling challenges.
Read more...
Building a Robust RAG Evaluation Pipeline with Synthetic Data 🚀
Building an end-to-end evaluation pipeline for RAG systems using synthetic data generation.
Read more...
Gretel Unlocks PII Detection with Synthetic Financial Document Dataset
Gretel releases a new synthetic financial document dataset to empower AI developers in building customized and highly performant sensitive data detection systems.
Read more...
Synthesizing Private Patient Data with Gretel: A Step-by-Step Guide
Create privacy-safe synthetic patient data with Gretel, ensuring compliance, secure sharing, and actionable insights for AI and machine learning in healthcare.
Read more...%20(6).png)
Addressing Concerns of Model Collapse from Synthetic Data in AI
How thoughtful, high-quality synthetic data generation, rather than 'indiscriminate' use, can prevent model collapse.
Read more...
Conditional Text Generation by Fine-Tuning Gretel GPT
Augment machine learning datasets with synthetically generated text and labels using an open-source implementation of GPT-3.
Read more...
Fine-tune a MPT-7B LLM with Gretel GPT
Learn how to fine-tune and prompt mpt-7b to generate responses matching popular Twitter personalities with Gretel GPT.
Read more....png)
How to Create Synthetic Data at High Quality for Fine-Tuning LLMs
Gretel Navigator’s synthetic data generation outperformed OpenAI's GPT-4 by 25.6%, surpassed Llama3-70b by 48.1%, and exceeded human expert-curated data by 73.6%.
Read more....png)
Teaching AI to Think: A New Approach with Synthetic Data and Reflection
Gretel's synthetic GSM8k dataset shows an 84% improvement for AI Reasoning tasks vs synthetic data generated without the Reflection technique.
Read more...
Privacy-preserving AI development with Azure & Gretel
Leveraging Gretel's privacy-preserving synthetic data generation platform to fine-tune Azure OpenAI Service models in the financial domain.
Read more...
Synthetic Data and the Data-centric Machine Learning Life Cycle
Gretel's synthetic data platform overcomes challenges across the data-centric machine learning life cycle to enable AI and ML solutions.
Read more...%20(7).png)
GSM-Symbolic: Analyzing LLM Limitations in Mathematical Reasoning and Potential Solutions
What The Recent Paper on LLM Reasoning Got Right—And What It Missed.
Read more...
Generate textbook-quality synthetic data for training LLMs and SLMs
How to use Gretel Navigator for generating diverse, high-quality training data to create better language models.
Read more...
Differential Privacy and Synthetic Text Generation with Gretel: Making Data Available at Scale (Part 1)
How differential privacy can generate provably private synthetic text data for a variety of enterprise AI applications.
Read more...
Prompting Llama-2 at Scale with Gretel
Discover how to efficiently use Gretel's platform for prompting Llama-2 on large datasets, whether you're completing answers, generating synthetic text, or labeling.
Read more...
How to Safely Query Enterprise Data with Langchain Agents + SQL + OpenAI + Gretel
How combining agent-based methods, LLMs, and synthetic data enables natural language queries for databases and data warehouses, sans SQL.
Read more...
Predicting Patient Stay Durations in the ER with Safe Synthetic Data
Here's how a hospital uses Gretel to help forecast staffing and resource needs for their emergency care unit, and to identify emerging trends in outbreaks.
Read more...
Unlocking Adapted LLMs on Enterprise Data
Gretel GPT supports new, state-of-the-art LLMs, and makes it easier for you to trust the privacy and accuracy of LLMs for enterprise use-cases.
Read more...
Scale Synthetic Data to Millions of Rows with ACTGAN
Discover how Gretel ACTGAN can help businesses generate synthetic data at scale with improved accuracy, faster training, and reduced memory requirements.
Read more...
Anonymize tabular data to meet GDPR privacy requirements
Learn how to anonymize tabular data to meet GDPR standards using Gretel's synthetic data APIs.
Read more...
Conditional data generation in 4 lines of code
Augment or balance your ML datasets in minutes with state-of-the-art generative models.
Read more...
Generate synthetic data in 3 lines of code
Learn the simplest way to generate synthetic data without setting up your own infrastructure and GPUs.
Read more...
Workshop: Generating Synthetic Data for Healthcare & Life Sciences
How to enable faster access to data for medical research with statistically accurate, equitable and private synthetic datasets.
Read more...
Using generative, differentially-private models to build privacy-enhancing, synthetic datasets from real data.
We’re going to train and build our synthetic dataset off of a real-time public feed of e-bike ride-share data called the GBFS (General Bike-share Feed)
Read more...
Practical Privacy with Synthetic Data
Implementing a practical attack to measure un-intended memorization in synthetic data models.
Read more...
Gretel.ai Raises $12 Million in Series A to Safely Share, Build with Data
We are pleased to share that Gretel raised $12M in Series A funding. We're picking up strong momentum in our mission to help developers create safe data.
Read more...
Create artificial data with Gretel Synthetics and Google Colaboratory
Use Gretel Synthetics and Colaboratory’s free GPUs to train a model to automatically generate fake, anonymized data with differential privacy guarantees.
Read more...
Automate Detecting Sensitive Personally Identifiable Information (PII)
Use Gretel.ai's APIs to continuously detect and protect sensitive data including credit cards, credentials, names, and addresses.
Read more...
Creating synthetic time series data
A step-by-step guide to creating high quality synthetic time-series datasets with Python.
Read more...
Create a Location Generator GAN
How to train a FastCUT GAN on public location data from a few cities to predict realistic e-bike locations across the world.
Read more...
Create high quality synthetic data in your cloud with Gretel.ai and Python
Create differentially private, synthetic versions of datasets and meet compliance requirements to keep sensitive data within your approved environment.
Read more...
Install TensorFlow and PyTorch with CUDA, cUDNN, and GPU Support in 3 Easy Steps
Set up a cutting-edge environment for deep learning with TensorFlow 2.10, PyTorch, Docker, and GPU support.
Read more...
Gretel.README
At Gretel, we realized that we can apply machine learning, synthetic data, and formal reasoning to offer provable privacy guarantees for data.
Read more...
Improving massively imbalanced datasets in machine learning with synthetic data
Use synthetic data to improve model accuracy for fraud, cyber security, or any classification task with an extremely limited minority class.
Read more...
Data Is More Valuable When It Can Be Shared
Today, we are thrilled to announce the general availability of Gretel's privacy engineering APIs and services.
Read more...
What is Privacy Engineering?
In this post, we will dive into what privacy engineering is, why it’s important, and some of the core use cases we are seeing that are enabled by privacy.
Read more...
Reducing AI bias with Synthetic data
Generate artificial records to balance biased datasets and improve overall model accuracy.
Read more...
How To Create Differentially Private Synthetic Data
A practical guide to creating differentially private, synthetic data with Python and TensorFlow.
Read more...
Gretel.ai + Illumina - Using AI to create safe, synthetic datasets for genomics
Promising evidence that state-of-the-art synthetic data models can produce artificial versions of even highly dimensional and complex genomic and phenotypic data.
Read more...
What's new in Beta2
Beta2 for Gretel.ai is all about delivering privacy engineering as a service through clean, simple APIs.
Read more...
Simplifying Our APIs
Five new features that will make synthesizing data easier for busy developers and data scientists.
Read more...
Deep dive on generating synthetic data for Healthcare
Take a deep dive on training Gretel’s open-source, synthetic data library to generate electronic health records that protect individual privacy (PII).
Read more...
How to use Weights & Biases with Gretel.ai
How to use Weights & Biases’ ML hyperparameter sweeps tool to optimize the accuracy of your synthetic data.
Read more...
Walkthrough: Create Synthetic Data from any DataFrame or CSV
Train an AI model to create an anonymized version of your dataset using Python, Pandas, and gretel-synthetics.
Read more...
Synthetic Time Series Data Creation for Finance
How we generated high-quality synthetic time-series data for one of the largest financial institutions in the world.
Read more...