Simplifying Our APIs

Five new features that will make synthesizing data easier for busy developers and data scientists.
Copyright (c) 2022 Gretel.ai
Copyright (c) 2022 Gretel.ai

As a developer or data scientist, we know you are really busy. At Gretel, we’re always trying and improving on ways to make synthetic data simpler and more accessible for our users. We have heard your feedback, and are excited to launch five new features that will hopefully make your day a little easier. Cheers!

(1) Simplified authentication

A pain point for anyone using Jupyter notebooks is that each time the notebook is restarted, you are prompted to re-enter your Gretel API key. This requires you to jump over to the console, and then copy+paste the API key into the notebook cell. 

Now, you can have Gretel cache your API key to the file system directly from the SDK. Additionally, use the “validate” option to test it here and create an error if your API key is invalid.

Old version:

from gretel_client import configure_session
from getpass import getpass

configure_session(ClientConfig(api_key=getpass(prompt="Enter Gretel API key"), endpoint="https://api.gretel.cloud"))

New version:‍

from gretel_client import configure_session

configure_session(api_key="prompt", validate=True, cache="yes")

(2) Project creation

By default, the create_project() command creates a new project each time that it is run. So, if you are running your blueprint notebook where you create a project called ‘synthetic-data’, and run the notebook 5 times, you will have 5 new projects created in the console interface. 

Use create_or_get_unique_project() to use a Gretel project if it exists, or create one if it does not. Using the example above, if you run a notebook above 5 times, you’ll have 5 synthetic models created inside a single project. 

Old version:

from gretel_client.projects import create_project

project = create_project(display_name="synthetic-data")

New version:

from gretel_client.projects import create_or_get_unique_project

project = create_or_get_unique_project(display_name="synthetic-data")

(3) Model names

Gretel Synthetics automatically-generated model names are adorable (see `fluffy-fabulous-dog` or `enormous-handsome-hedgehog`), but sometimes you may want to name a model something more descriptive. You can now do that manually with the `model.name` attribute. Simply set the model name to anything you'd like, to easily remember the settings used for your model.

model = project.create_model_obj(model_config=config)
model.name = ‘my awesome model’

(4) Model configurations

Gretel’s model configurations are portable across the console, CLI, and SDKs and are based on YAML, a human-friendly markup language. This makes for easy editing, but it can be cumbersome to load into a Jupyter notebook. Using read_model_config() makes this a one liner, and supports either default configurations, or loading configurations from a file. 

Old version:

import yaml with open("https://raw.githubusercontent.com/gretelai/gretel-blueprints/main/config_templates/gretel/synthetics/default.yml", 'r') as stream:    
config = yaml.safe_load(stream)

New version: 

from gretel_client.projects.models import read_model_config
config = read_model_config("synthetics/default")

(5) Parquet support

In addition to CSV, JSON, and JSON-L formats, Gretel’s SDK and CLI now support Apache Parquet as an input type. Parquet is a favorite data format for developers and data scientists that allows for very efficient compression and querying of large datasets. Just point your model.data_source to a local or remotely accessible Parquet file.

New version:

model = project.create_model_obj(model_config=config)
model.data_source = 'training_data.parquet'

If you have other feedback on how we can make synthesizing data easier for you, drop us an email at hi@gretel.ai or visit our community Slack and start a conversation with our developers and data scientists. Thanks for using Gretel!