Data science
How we accidentally discovered personal data in a popular Kaggle dataset
Learn about new features in Gretel, and how those features enabled us to discover personally identifiable information (PII) in a popular Kaggle dataset.
Read more...Q&A Series: Solving Privacy Problems with Synthetic Data
Answers to some questions about synthetic data that audience members submitted during Gretel's talk at The Rise of Privacy Tech’s Data Privacy Week 2022 conference.
Read more...Exploring NLP Part 2: A New Way to Measure the Quality of Synthetic Text
By merging breakthrough research on text metrics with new types of embeddings, we produce a reliable metric that is highly correlated with human ratings.
Read more...How accurate is my synthetic data?
Gretel’s new synthetic report is here, featuring a high-level score and metrics to help you assess the quality of your synthetic data.
Read more...A guide to load (almost) anything into a DataFrame
Pandas provides so many options of reading data into a DataFrame, here's our short guide to ones that we found most useful.
Read more...Exploring NLP Part 1: Why Should a Privacy Engineering Company Care About NLP?
There is a lot of hype around NLP. In this post, we explore some of the criticisms and how you can use this technology responsibly.
Read more...How to safely work with another company's data
Data sharing is central to modern business but entails risks. Synthetic data can enable data sharing while reducing the risk of privacy-compromising linkage attacks.
Read more...Create Synthetic Time-series Data with DoppelGANger and PyTorch
Generate synthetic time series data with Gretel.ai’s open-source PyTorch implementation of DoppelGANger.
Read more...Using generative, differentially-private models to build privacy-enhancing, synthetic datasets from real data.
We’re going to train and build our synthetic dataset off of a real-time public feed of e-bike ride-share data called the GBFS (General Bike-share Feed)
Read more...Generate synthetic data in 3 lines of code
Learn the simplest way to generate synthetic data without setting up your own infrastructure and GPUs.
Read more...