Machine Learning

How to Generate Synthetic Data: Tools and Techniques to Create Interchangeable Datasets
Synthetic data is algorithmically generated data that mirrors the statistical properties of the dataset it’s based on. Learn how to make high-quality synthetic data.
Read more...
Compare Synthetic and Real Data on ML Models with the new Gretel Synthetic Data Utility Report
Use Gretel Evaluate classification and regression tasks to validate synthetic data utility
Read more...
Conditional Text Generation by Fine-Tuning Gretel GPT
Augment machine learning datasets with synthetically generated text and labels using an open-source implementation of GPT-3.
Read more...
Synthetic Data and the Data-centric Machine Learning Life Cycle
Gretel's synthetic data platform overcomes challenges across the data-centric machine learning life cycle to enable AI and ML solutions.
Read more...
Red Teaming Synthetic Data Models
How we implemented a practical attack on a synthetic data model to validate its ability to protect sensitive information under different parameter settings.
Read more...
Machine Learning Accuracy Using Synthetic Data
Can synthetic data really be used in machine learning? We explore the utility of synthetic data created from popular datasets and tested on popular ML algorithms.
Read more...
Test Data Generation: Uses, Benefits, and Tips
Test data generation is the process of creating new data that replicates an original dataset. Here’s how developers and data engineers use it.
Read more...
Transforms and Synthetics on Relational Databases
A walkthrough of our new multi-table transform and multi-table synthetics notebooks, which can be used independently or simultaneously.
Read more...
What Is Data Simulation?
Data simulation is the process of using large quantities of data to predict events and validate models. Get the full data simulation definition.
Read more...
Teaching large language models to zip their lips with RLPF
Gretel introduces Reinforcement Learning from Privacy Feedback (RLPF), a novel approach to reduce the likelihood of a language model leaking private information.
Read more...
AWS + Gretel Synthetic Data Accelerator Program for Generative AI
How our new Synthetic Data Accelerator Program with AWS will help enterprises scale responsible AI systems fast.
Read more....png)
Introducing Gretel MLOps
Use Gretel's synthetic data platform to replace, augment, or balance training datasets within MLOps pipelines like Vertex AI, Azure ML, and Amazon SageMaker.
Read more...
Gretel announces partnership with Microsoft Azure and joins Microsoft for Startups Pegasus Program
Gretel’s privacy-first generative AI is now available to all Azure users as well as select enterprises through the Microsoft for Startups Pegasus Program.
Read more...
Optimize the Llama-2 Model with Gretel’s Text SQS
How Gretel's data quality analysis tools for evaluating generated text can help you optimize the performance LLMs, like the Llama-2 model.
Read more...
Prompting Llama-2 at Scale with Gretel
Discover how to efficiently use Gretel's platform for prompting Llama-2 on large datasets, whether you're completing answers, generating synthetic text, or labeling.
Read more...
How to Safely Query Enterprise Data with Langchain Agents + SQL + OpenAI + Gretel
How combining agent-based methods, LLMs, and synthetic data enables natural language queries for databases and data warehouses, sans SQL.
Read more...
Gretel GPT Sentiment Swap
Let’s fine tune and prompt a large language model to swap the sentiment of product reviews!
Read more...
Comprehensive Data Cleaning for AI and ML
Learn to prepare tabular data for AI and ML with an end-to-end data cleaning workflow.
Read more...
Predicting Patient Stay Durations in the ER with Safe Synthetic Data
Here's how a hospital uses Gretel to help forecast staffing and resource needs for their emergency care unit, and to identify emerging trends in outbreaks.
Read more...
Unlocking Adapted LLMs on Enterprise Data
Gretel GPT supports new, state-of-the-art LLMs, and makes it easier for you to trust the privacy and accuracy of LLMs for enterprise use-cases.
Read more...
Scale Synthetic Data to Millions of Rows with ACTGAN
Discover how Gretel ACTGAN can help businesses generate synthetic data at scale with improved accuracy, faster training, and reduced memory requirements.
Read more...
Augmenting ML Datasets with Gretel and Vertex AI
How to utilize Gretel to create high-quality synthetic tabular data that you can use as training data for a classification model in Vertex AI.
Read more...
Synthetic Image Models for Smart Agriculture
Learn how synthetic image models can address data drift and improve a computer vision model's accuracy in unexpected conditions.
Read more...
Downstream ML classification with Gretel ACTGAN and PyCaret
Learn about downstream machine learning tasks and synthetic data with Gretel’s new ACTGAN model and the PyCaret library
Read more...
Generate synthetic Taylor Swift-like lyrics using Gretel GPT
Celebrate Taylor Swift's The Eras Tour with lyrics generated with Gretel GPT, our synthetic language model.
Read more...
Community Insights: Overcoming Medical Class Imbalance with Synthetic Data
An interview with one of Gretel's users on why medical practitioners turn to synthetic data when overcoming challenges with clinical data.
Read more...
Generate synthetic data in 3 lines of code
Learn the simplest way to generate synthetic data without setting up your own infrastructure and GPUs.
Read more...
Using generative, differentially-private models to build privacy-enhancing, synthetic datasets from real data.
We’re going to train and build our synthetic dataset off of a real-time public feed of e-bike ride-share data called the GBFS (General Bike-share Feed)
Read more...
ML Models: Understanding the Fundamentals
Machine learning models can be trained to recognize patterns in datasets. By utilizing algorithms, they can learn to make decisions based on these patterns.
Read more...
Common misconceptions about differential privacy
This article clarifies some common misconceptions about differential privacy and what it guarantees.
Read more...
Create Synthetic Time-series Data with DoppelGANger and PyTorch
Generate synthetic time series data with Gretel.ai’s open-source PyTorch implementation of DoppelGANger.
Read more...
Diffusion models for document synthesis
Explore state-of-the-art image synthetics for business documents using diffusion models.
Read more...
How to safely work with another company's data
Data sharing is central to modern business but entails risks. Synthetic data can enable data sharing while reducing the risk of privacy-compromising linkage attacks.
Read more...
Exploring NLP Part 1: Why Should a Privacy Engineering Company Care About NLP?
There is a lot of hype around NLP. In this post, we explore some of the criticisms and how you can use this technology responsibly.
Read more...
Exploring NLP Part 2: A New Way to Measure the Quality of Synthetic Text
By merging breakthrough research on text metrics with new types of embeddings, we produce a reliable metric that is highly correlated with human ratings.
Read more...
Create artificial data with Gretel Synthetics and Google Colaboratory
Use Gretel Synthetics and Colaboratory’s free GPUs to train a model to automatically generate fake, anonymized data with differential privacy guarantees.
Read more...
Q&A Series: Solving Privacy Problems with Synthetic Data
Answers to some questions about synthetic data that audience members submitted during Gretel's talk at The Rise of Privacy Tech’s Data Privacy Week 2022 conference.
Read more...
How we accidentally discovered personal data in a popular Kaggle dataset
Learn about new features in Gretel, and how those features enabled us to discover personally identifiable information (PII) in a popular Kaggle dataset.
Read more...
Create a Location Generator GAN
How to train a FastCUT GAN on public location data from a few cities to predict realistic e-bike locations across the world.
Read more...
Create high quality synthetic data in your cloud with Gretel.ai and Python
Create differentially private, synthetic versions of datasets and meet compliance requirements to keep sensitive data within your approved environment.
Read more...
Install TensorFlow and PyTorch with CUDA, cUDNN, and GPU Support in 3 Easy Steps
Set up a cutting-edge environment for deep learning with TensorFlow 2.10, PyTorch, Docker, and GPU support.
Read more...
Improving massively imbalanced datasets in machine learning with synthetic data
Use synthetic data to improve model accuracy for fraud, cyber security, or any classification task with an extremely limited minority class.
Read more...
What is Model Soup?
A brief exploration of model soup, the new ensembling technique that takes the average weights of multiple models to improve overall performance.
Read more...
Optuna Your Model Hyperparameters
We explore the popular open-source package Optuna to demonstrate how you can optimize your model hyperparameters and build the best synthetic model possible.
Read more...
Reducing AI bias with Synthetic data
Generate artificial records to balance biased datasets and improve overall model accuracy.
Read more...
Gretel Synthetics: Introducing v0.10.0
Explore how to create a batch interface with the latest version of Gretel Synthetics on Google Colaboratory.
Read more...
Innovating With FastText and Table Headers
Look at how FastText word embeddings can help to quickly understand new datasets, and build more consistent labels for your own data.
Read more...
What We’re Reading: Trends & Takeaways from the NeurIPS 2021 Conference
The Gretel research team's favorite trends and takeaways from the NeurlPS 35th Annual Conference on Neural Information Processing Systems.
Read more...
Evaluating Data Sampling Methods with a Synthetic Quality Score
An evaluation of the effect of sampling procedures on the quality of synthetic tabular data using Gretel.ai's Synthetic Quality Score (SQS).
Read more...
How to use Weights & Biases with Gretel.ai
How to use Weights & Biases’ ML hyperparameter sweeps tool to optimize the accuracy of your synthetic data.
Read more...
Advanced Data Privacy: Gretel Privacy Filters and ML Accuracy
A look at how using Gretel’s Privacy Filters to immunize synthetic datasets against adversarial attacks can impact machine learning accuracy.
Read more...
Synthetic Time Series Data Creation for Finance
How we generated high-quality synthetic time-series data for one of the largest financial institutions in the world.
Read more...