Introduction
Welcome to Gretel! We're excited that you are interested in learning more about the benefits of synthetic data. In this learning pathway, we have gathered our best resources on fundamental concepts you can navigate through at your own pace. By the end of this course, you will have learned how to use Gretel's Toolkit to generate synthetic data that's as good or even better than your data.
Getting Started
“Synthetic data is artificially annotated information that is generated by computer algorithms or simulations, commonly used as an alternative to real-world data.”
In this section, you will learn what synthetic data is, why it's necessary, and how you can use it to solve many unique use cases.
This blog post is an overview of the entire Synthetic Data space. If you are new to Synthetic Data, this is a must read.
This blog post discusses how Gretel ensures that your Synthetic Data maintains the same statistical properties as the original dataset. It details our Synthetic Data Quality Score and how you can use it.
One of the biggest bottlenecks to innovation that developers and data scientists face today is getting access to data, or creating the data that you need to test an idea or build a new feature. This blog post discusses Gretel’s commitment to helping developers make their code safe to share, therefore enabling for faster innovation.
Privacy engineering the systematic application of engineering concepts for protecting sensitive information. This blog post discusses the importance of privacy engineering and the value it can bring to your organization
Originally titled “Why Should Nonprofits Should Care About Synthetic Data”, this blog is applicable to more than Nonprofit organizations. It discusses the benefits Synthetic Data has regarding data privacy as well as leveraging limited data sets.
Setup
In this section you’ll create a Gretel account, walk through initial setup of your account, and learn how to use Gretel in your personal development environment.
An API Key is necessary to use Gretel’s products. Use this portion of the docs to learn how to generate your API Key.
Environment Setup
Gretel’s products support a multitude of ways to be used. From a no-code solution to running on prem, the below links detail how to get started using Gretel in your preferred environment.
Gretel Cloud Console offers a no-code solution for its products.
Gretel has an open source Python SDK available. For examples demonstrating the SDK, check out the docs/notebooks directory in Gretel Blueprints.
If the command line is your home, Gretel has a CLI tool that you can download and run. Each of the Gretel products have examples demonstrating using the CLI to create synthetic data, classify, transform, and more.
The default option for running Gretel workloads is to run them in Gretel’s cloud. If you would like to run the workload on your hardware, follow the above instructions to setup Docker with a GPU to execute your workload.
The Gretel Toolkit
Gretel's Toolkit comprises three primary products: Gretel Synthetics, Gretel Classify, and Gretel Transform. In this section, you'll learn about Gretel's products, what each can do, and which product is best suited to solve the unique problems you encounter.
Don’t know which model to use? This helpful decision tree will help you determine which product you need.
Learn how to create and modify a synthetic data model configuration before model training to support different data types and privacy protections.
Define a policy to discover and label sensitive data including personally identifiable information, credentials, and even custom regular expressions inside text, logs, and other structured data.
Learn how to define a policy to label and transform a dataset, with support for advanced options including custom regular expression search, date shifting, and fake entity replacements.
Next Steps
Are you excited to use Gretel for your data engineering needs? We sure hope so. Now that you have completed this course, we recommend checking out a these helpful links:
Now that you’ve explored Gretel, you can walk through this our notebook to generate your first set of synthetic data. This notebook is a great starting point for those new to Synthetic Data.
Looking for sample code? Check out our Gretel Blueprints repository on GitHub containing many examples demonstrating many of the common (and uncommon) use cases of Gretel products. This repository is well maintained and constantly being updated, so be sure to Star it to get updates when we release new blueprints.
These resources merely scratch the surface of the capabilities of Gretel. Be sure to checkout the Documentation and Blog for everything else.
Gretel Community
Interested in getting the latest news about Synthetic Data? Want to chat with others who are using Synthetic Data in their workloads?
For any other questions, comments, or concerns, please join our Community Discord and let us know.
Additional Resources
Here’s some of our favorite resources regarding Synthetic Data