Navigator Fine Tuning is Now Generally Available
IntroductionÂ
Gretel is thrilled to announce the General Availability (GA) of Navigator Fine Tuning. This latest addition to our suite of synthetic data solutions allows users to inject their business and domain-specific knowledge into Navigator, our privacy-preserving compound AI system, by training it on a relevant real-world dataset. One of the standout features of Navigator Fine Tuning is its support for multiple tabular data modalities within a single model, including numerical, categorical, free text, and sequential (time-series) data.
During our open preview period, we saw hundreds of models trained with thousands of minutes of API runtime from our developer community and enterprise customers. This showcases the clear interest in creating synthetic versions of complex enterprise datasets such as patient health records, sales data, and financial transaction histories. We have integrated these learnings to improve our model and ensure performant results across a variety of edge cases and industry domains in this period to ensure a seamless, scalable experience post GA. We feel so confident in these capabilities that Navigator Fine Tuning is now our default model offered when selecting the âstart from scratchâ blueprint on our console dashboard.
In the sections below weâll provide a short summary of Navigator Fine Tuningâs capabilities as well as a few examples of how simple it is to get started with different use cases.Â
Navigator Fine Tuning Overview
For those new to Navigator Fine Tuning, below are some of the key features that make it such a popular option for domain-specific synthetic data generation:Â
- Strong privacy guarantees: Navigator Fine Tuning performs well when measured on its ability to protect sensitive entities from your training dataset, even when measured on our new privacy metrics.Â
- Flexibility: With a single model, users can now tackle complex tabular datasets that have a range of field types, such as time-series, categorical, numerical, and free text.Â
- High-quality: Under the hood, Navigator Fine Tuning leverages the power of a pretrained transformer-based model, providing best-in-class data fidelity, as measured by our Synthetic Data Quality Report. Â
- Simplicity: Setting the configuration for Navigator Fine Tuning is very easy. The default configuration serves as an excellent starting point and is generalizable for a wide range of datasets.Â
- Console and SDK: Navigator Fine Tuning is accessible both directly from our Console using its Blueprint, or with just a few lines of code, you can do the same using our SDK.
đď¸ Take NavFT for a Test DriveÂ
Below we step through four examples that showcase the flexibility and simplicity of applying Navigator Fine Tuning to a diverse range of tabular data modalities. Feel free to follow along and code with us â you can copy the code directly or use this Colab Notebook!
To get started, you will need a free Gretel key. đ Next, install the Gretelâs Python đ client:
We start with some boilerplate code that initializes a Gretel session and defines the base path to our example data:
Example 1: Fine tune on numerical data
For our first example, weâll train Navigator Fine Tuning on a dataset for modeling wine quality based on physicochemical tests. đˇ This is a purely numerical dataset, containing floating-point and integer values. Â
Sample data (url):
Hereâs how to train a model using the default configuration:
As the below image shows, Navigator Fine Tuning performed excellently on both our Synthetic Data Quality Report and privacy scores!
âSide note: Our privacy score runs on all datasets, regardless of whether we expect particular fields to be private. It simulates attacks to see how vulnerable the data are, regardless of whether its about wine or health records. To learn more about our evaluation tools and safety measures, read the docs on our synthetic data quality scoring system.
Example 2: Fine tune on categorical data
Next, letâs see how we can fine tune the model on a census income dataset, which primarily consists of categorical integers and strings.Â
Sample data (url):
We only need to change the path to the data source. We can use the same default configuration:
Our synthetic data quality and privacy metrics again highlight an excellent performance by Navigator Fine Tuning:
Example 3: Fine tune on free text data
As an LLM-based model, Navigator Fine Tuning also naturally supports free text data. This example uses a dataset for evaluating the performance of intent classification systems in the presence of "out-of-scope" queries.Â
Sample data (url)
Again, we can use the same default training configuration. We just need to change the path to the data source:
To compare the synthetic text with the real data, we can use Gretel's Synthetic Text Data Quality Report. The figure below shows that the structure of the synthetic and real text are nicely aligned.
Example 4: Fine tune on a mix of tabular modalities
Finally, using the same configuration as the above examples, Navigator Fine Tuning can seamlessly support tables that contain a mix of numerical, categorical, and free text data. In this example, we use a synthetic financial transaction dataset, which we generated from scratch with Gretel Navigator.Â
Sample data (url)
Conclusion
We're excited to add fine-tuning capabilities to our privacy-preserving compound AI system, Navigator, and expand ways for developers to safely design tailor-made data solutions to their AIÂ projects. We hope these workflows give you a sense of what's possible, and spark ideas for how you can use Navigator Fine Tuning in your own AIÂ development.
If you have any questions about Gretel or how to use Navigator Fine Tuning, join our Discord community for instant access to expert advice from the Gretel team! If you're interested in developing your synthesizing skills further, Gretel University is also now liveâyour ultimate hub for mastering synthetic data, where youâll find valuable insightful videos and our curated list of expert resources.
Go forth and synthesize.