Back to all posts

Data science

Copyright © 2022 Gretel.ai

How to Generate Synthetic Data: Tools and Techniques to Create Interchangeable Datasets

Synthetic data is algorithmically generated data that mirrors the statistical properties of the dataset it’s based on. Learn how to make high-quality synthetic data.
Read more...
Copyright © 2022 Gretel.ai

Red Teaming Synthetic Data Models

How we implemented a practical attack on a synthetic data model to validate its ability to protect sensitive information under different parameter settings.
Read more...

Machine Learning Accuracy Using Synthetic Data

Can synthetic data really be used in machine learning? We explore the utility of synthetic data created from popular datasets and tested on popular ML algorithms.
Read more...
Copyright © 2022 Gretel.ai

What Is Data Simulation?

Data simulation is the process of using large quantities of data to predict events and validate models. Get the full data simulation definition.
Read more...

Optimize the Llama-2 Model with Gretel’s Text SQS

How Gretel's data quality analysis tools for evaluating generated text can help you optimize the performance LLMs, like the Llama-2 model.
Read more...

How to Safely Query Enterprise Data with Langchain Agents + SQL + OpenAI + Gretel

How combining agent-based methods, LLMs, and synthetic data enables natural language queries for databases and data warehouses, sans SQL.
Read more...
Copyright (©) 2023 Gretel

Gretel GPT Sentiment Swap

Let’s fine tune and prompt a large language model to swap the sentiment of product reviews!
Read more...

Comprehensive Data Cleaning for AI and ML

Learn to prepare tabular data for AI and ML with an end-to-end data cleaning workflow.
Read more...
Copyright © 2022 Gretel.ai

Community Insights: Overcoming Medical Class Imbalance with Synthetic Data

An interview with one of Gretel's users on why medical practitioners turn to synthetic data when overcoming challenges with clinical data.
Read more...

Generate synthetic data in 3 lines of code

Learn the simplest way to generate synthetic data without setting up your own infrastructure and GPUs.
Read more...

Using generative, differentially-private models to build privacy-enhancing, synthetic datasets from real data.

We’re going to train and build our synthetic dataset off of a real-time public feed of e-bike ride-share data called the GBFS (General Bike-share Feed)
Read more...
Copyright © 2022 Gretel.ai

Create Synthetic Time-series Data with DoppelGANger and PyTorch

Generate synthetic time series data with Gretel.ai’s open-source PyTorch implementation of DoppelGANger.
Read more...

How to safely work with another company's data

Data sharing is central to modern business but entails risks. Synthetic data can enable data sharing while reducing the risk of privacy-compromising linkage attacks.
Read more...
Copyright (c) 2021 Gretel

Exploring NLP Part 1: Why Should a Privacy Engineering Company Care About NLP?

There is a lot of hype around NLP. In this post, we explore some of the criticisms and how you can use this technology responsibly.
Read more...

A guide to load (almost) anything into a DataFrame

Pandas provides so many options of reading data into a DataFrame, here's our short guide to ones that we found most useful.
Read more...

How accurate is my synthetic data?

Gretel’s new synthetic report is here, featuring a high-level score and metrics to help you assess the quality of your synthetic data.
Read more...
Copyright (c) 2021 Gretel

Exploring NLP Part 2: A New Way to Measure the Quality of Synthetic Text

By merging breakthrough research on text metrics with new types of embeddings, we produce a reliable metric that is highly correlated with human ratings.
Read more...

Q&A Series: Solving Privacy Problems with Synthetic Data

Answers to some questions about synthetic data that audience members submitted during Gretel's talk at The Rise of Privacy Tech’s Data Privacy Week 2022 conference.
Read more...
Gretel Workflow

How we accidentally discovered personal data in a popular Kaggle dataset

Learn about new features in Gretel, and how those features enabled us to discover personally identifiable information (PII) in a popular Kaggle dataset.
Read more...

README.V2

We founded Gretel based on our beliefs that data shouldn’t be scary.
Read more...

Creating synthetic time series data

A step-by-step guide to creating high quality synthetic time-series datasets with Python.
Read more...
Credit: sylv1rob1 via ShutterStock

Create a Location Generator GAN

How to train a FastCUT GAN on public location data from a few cities to predict realistic e-bike locations across the world.
Read more...

Create high quality synthetic data in your cloud with Gretel.ai and Python

Create differentially private, synthetic versions of datasets and meet compliance requirements to keep sensitive data within your approved environment.
Read more...

Veterans Day Reflections: Open source software and evacuation operations, a remarkable combination.

Quickly and safely aggregate geolocation data for location density analysis using a hexagonal grid system.
Read more...

Improving massively imbalanced datasets in machine learning with synthetic data

Use synthetic data to improve model accuracy for fraud, cyber security, or any classification task with an extremely limited minority class.
Read more...
Copyright © 2022 Gretel.ai

What is Model Soup?

A brief exploration of model soup, the new ensembling technique that takes the average weights of multiple models to improve overall performance.
Read more...
Source: Kubkoo, via iStockPhoto

Reducing AI bias with Synthetic data

Generate artificial records to balance biased datasets and improve overall model accuracy.
Read more...
Copyright © 2022 Gretel.ai

The Evolution of Gretel's Developer Stack for Synthetic Data

Some of our newest product and technology initiatives that will ensure the Gretel platform continues to grow and evolve with the needs of modern data consumers.
Read more...

How To Create Differentially Private Synthetic Data

A practical guide to creating differentially private, synthetic data with Python and TensorFlow.
Read more...
Source: enjoynz, via iStockPhoto

Innovating With FastText and Table Headers

Look at how FastText word embeddings can help to quickly understand new datasets, and build more consistent labels for your own data.
Read more...
Copyright © 2022 Gretel Labs. All rights reserved.

What We’re Reading: Trends & Takeaways from the NeurIPS 2021 Conference

The Gretel research team's favorite trends and takeaways from the NeurlPS 35th Annual Conference on Neural Information Processing Systems.
Read more...
(c) Gretel.ai

What's new in Beta2

Beta2 for Gretel.ai is all about delivering privacy engineering as a service through clean, simple APIs.
Read more...

Evaluating Data Sampling Methods with a Synthetic Quality Score

An evaluation of the effect of sampling procedures on the quality of synthetic tabular data using Gretel.ai's Synthetic Quality Score (SQS).
Read more...
Copyright © 2022 Gretel.ai

Measure the Quality of any Synthetic Dataset with Gretel Evaluate

Assessing the efficacy and quality of synthetic data with Gretel Evaluate API.
Read more...

Deep dive on generating synthetic data for Healthcare

Take a deep dive on training Gretel’s open-source, synthetic data library to generate electronic health records that protect individual privacy (PII).
Read more...