Back to all posts

Data science

Gretel Workflow

How we accidentally discovered personal data in a popular Kaggle dataset

Learn about new features in Gretel, and how those features enabled us to discover personally identifiable information (PII) in a popular Kaggle dataset.
Read more...

Q&A Series: Solving Privacy Problems with Synthetic Data

Answers to some questions about synthetic data that audience members submitted during Gretel's talk at The Rise of Privacy Tech’s Data Privacy Week 2022 conference.
Read more...
Copyright (c) 2021 Gretel

Exploring NLP Part 2: A New Way to Measure the Quality of Synthetic Text

By merging breakthrough research on text metrics with new types of embeddings, we produce a reliable metric that is highly correlated with human ratings.
Read more...

How accurate is my synthetic data?

Gretel’s new synthetic report is here, featuring a high-level score and metrics to help you assess the quality of your synthetic data.
Read more...

A guide to load (almost) anything into a DataFrame

Pandas provides so many options of reading data into a DataFrame, here's our short guide to ones that we found most useful.
Read more...
Copyright (c) 2021 Gretel

Exploring NLP Part 1: Why Should a Privacy Engineering Company Care About NLP?

There is a lot of hype around NLP. In this post, we explore some of the criticisms and how you can use this technology responsibly.
Read more...

How to safely work with another company's data

Data sharing is central to modern business but entails risks. Synthetic data can enable data sharing while reducing the risk of privacy-compromising linkage attacks.
Read more...
Copyright © 2022 Gretel.ai

Create Synthetic Time-series Data with DoppelGANger and PyTorch

Generate synthetic time series data with Gretel.ai’s open-source PyTorch implementation of DoppelGANger.
Read more...

Using generative, differentially-private models to build privacy-enhancing, synthetic datasets from real data.

We’re going to train and build our synthetic dataset off of a real-time public feed of e-bike ride-share data called the GBFS (General Bike-share Feed)
Read more...

Generate synthetic data in 3 lines of code

Learn the simplest way to generate synthetic data without setting up your own infrastructure and GPUs.
Read more...