We just streamlined Gretel’s Python SDK

Discover the streamlined Gretel Python SDK. Start building with synthetic data in just 3 lines of code 🚀

Published by

No items found.

•

Updated

October 17, 2023

At Gretel, we strive to build tools that our users love. Our goal is to make Gretel the simplest and most enjoyable way for developers to generate high-quality synthetic data. This is why we are beyond excited to introduce the new high-level interface for our Python SDK, which we designed to be as simple, flexible, and intuitive as possible for our users.

With nearly 900k downloads, our Python SDK is one the most popular ways to build with Gretel. With the new interface, our users will save time (and brain power!) by writing 3x less code than was previously required. Their workflows will also be dramatically simplified through more thoughtful logging, helper methods for assessing synthetic data quality, and a streamlined way to dynamically customize model configurations. And since the interface is built on top of the lower-level SDK, no changes to any existing code are necessary.

Sounds awesome, right? Let’s dive in!

A first look at our SDK’s new interface

The high-level interface is implemented in our SDK as the new Gretel object, which serves as a one-stop shop for interacting with Gretel’s APIs, models, and the associated artifacts.

With the new interface, training a state-of-the-art deep generative model from scratch only takes a few lines of code:

from gretel_client import Gretel

gretel = Gretel(api_key="prompt")
trained = gretel.submit_train("tabular-actgan", data_source="data.csv")

Behind the scenes, Gretel spins up the necessary compute resources, loads and configures the model, and trains (in this case) our tabular ACTGAN model on the data in the input csv file.

Gretel’s Synthetic Data Quality Report is automatically generated and fetched for you to assess the quality of the model without leaving your notebook:

# view quality scores
print(trained.report)

# display full report in your notebook
trained.report.display_in_notebook()

On-demand synthetic data generation using any of your previously trained models is simple:

generated = gretel.submit_generate(trained.model_id, num_records=100)

Once again, Gretel will spin up the necessary compute resources to generate the data, so you don’t need to worry about managing cloud infrastructure or GPUs to run your model!

Upon completion of the generation job, your synthetic data will be automatically fetched and stored as a Pandas DataFrame, making it straightforward to integrate into your data pipeline:

print(generated.synthetic_data.head())

Ready to build with Gretel? Start here 👇

The best way to get started with the Gretel SDK and its new high-level interface is to work through the Gretel SDK Blueprints, which are a series of Colab notebooks that are designed to provide a solid foundation for building your own projects with Gretel.

Gretel 101 Blueprint – Learn the basics of the Gretel SDK by training a deep generative model in the Gretel Cloud.
Gretel Advanced Tabular Blueprint – Use the Gretel SDK to easily customize model configurations and conditionally generate synthetic data.
Gretel Text Generation Blueprint – Leverage the Gretel SDK to finetune and prompt a multi-billion parameter language model.

If you are already building with Gretel, you can start using the new interface right away by updating to the most recent version of our Python SDK: