Create artificial data with Gretel Synthetics and Google Colaboratory
In this post we’ll use Gretel Synthetics and Google Colaboratory’s free GPUs to train a machine learning model to automatically generate fake, anonymized data with differential privacy guarantees.
Today we will walk through some of the new features in Gretel’s gretel_synthetics open-source synthetic data library ver 0.6.0 including:
- Google SentencePiece support for unsupervised tokenization, with configurable vocabulary size & character coverage.
- smart_open support to load datasets from AWS, GCP, Azure.
- Launch directly into Colaboratory.
Check out the walk-through screencast below, or click the Colab link to get started creating your own synthetic dataset!
For a deep dive on anonymizing precise location data, check out our previous deep dive on anonymizing scooter ride-share data, and how we discovered and partnered with Uber to fix privacy concerns in public ride-share feeds.