Simplifying Our APIs
Five new features that will make synthesizing data easier for busy developers and data scientists.
As a developer or data scientist, we know you are really busy. At Gretel, we’re always trying and improving on ways to make synthetic data simpler and more accessible for our users. We have heard your feedback, and are excited to launch five new features that will hopefully make your day a little easier. Cheers!
(1) Simplified authentication
A pain point for anyone using Jupyter notebooks is that each time the notebook is restarted, you are prompted to re-enter your Gretel API key. This requires you to jump over to the console, and then copy+paste the API key into the notebook cell.
Now, you can have Gretel cache your API key to the file system directly from the SDK. Additionally, use the “validate” option to test it here and create an error if your API key is invalid.
(2) Project creation
By default, the create_project() command creates a new project each time that it is run. So, if you are running your blueprint notebook where you create a project called ‘synthetic-data’, and run the notebook 5 times, you will have 5 new projects created in the console interface.
Use create_or_get_unique_project() to use a Gretel project if it exists, or create one if it does not. Using the example above, if you run a notebook above 5 times, you’ll have 5 synthetic models created inside a single project.
(3) Model names
Gretel Synthetics automatically-generated model names are adorable (see `fluffy-fabulous-dog` or `enormous-handsome-hedgehog`), but sometimes you may want to name a model something more descriptive. You can now do that manually with the `model.name` attribute. Simply set the model name to anything you'd like, to easily remember the settings used for your model.
(4) Model configurations
Gretel’s model configurations are portable across the console, CLI, and SDKs and are based on YAML, a human-friendly markup language. This makes for easy editing, but it can be cumbersome to load into a Jupyter notebook. Using read_model_config() makes this a one liner, and supports either default configurations, or loading configurations from a file.
(5) Parquet support
In addition to CSV, JSON, and JSON-L formats, Gretel’s SDK and CLI now support Apache Parquet as an input type. Parquet is a favorite data format for developers and data scientists that allows for very efficient compression and querying of large datasets. Just point your model.data_source to a local or remotely accessible Parquet file.
If you have other feedback on how we can make synthesizing data easier for you, drop us an email at firstname.lastname@example.org or visit our community Slack and start a conversation with our developers and data scientists. Thanks for using Gretel!