CHANGELOG: Beta2

Here's what we learned about privacy engineering from 50+ companies and hundreds of developers.

Here's what we learned about privacy engineering from 50+ companies and hundreds of developers…

It's needed. Everywhere.

Gretel's mission is to enable developers, researchers, and scientists to quickly create safe versions of data that can be used for pre-production environments, machine learning workloads, and be shared across teams and organizations.

We categorize this work as privacy engineering. Current solutions have high barriers to entry, limited automation abilities, or are flat out hiding behind "contact us" messaging. We can do better, we have to do better. Gretel was founded by engineers and will always serve engineers. So, we set out to create products that users can get started with in minutes and scale massively.

To validate this problem we launched a free public beta almost six months ago.  What we learned and validated was amazing. What we figured out we need to do next was even better.

In the coming months, the Gretel team will be heads-down, integrating your feedback, reducing friction, and increasing simplicity.

Let's dive in.

What we learned

‍Since launching the beta we have had over 500 users signup. These users have run hundreds of synthetic data and transform workloads using our tutorials and blueprints for use cases like anonymizing and synthesizing data.  After aggregating telemetry and user feedback, we've learned quite a bit about our users, the industry, and core use cases:

Our Users

  • 50% identify as Software and Data engineers
  • 40% identify as Data Scientists and Machine Learning Researchers
  • 10% identify as Product, Marketing, and Executives

Industries

  • 60% of users are working in technology focused industries
  • 20% of in healthcare and finance
  • 20% in education, retail, and government

Use Cases

  • Generate synthetic data for ML workloads and data sharing. This includes synthesizing datasets that maintain their correlations / distributions, highly dimensional data, and time-series data.
  • Quickly anonymize and synthesize production data for pre-production purposes. Take sensitive production data and quickly create a version that can be used in development and testing environments.
  • Fix data imbalances. Utilize synthetics to boost samples of underrepresented classes in a dataset. Help make ML and AI workloads more fair and balanced.

90% of our users write code, exactly who we want to build for.  Our current beta orients around using a set of Gretel SDKs and Python notebooks to execute workloads.  This allowed flexibility to quickly build blueprints, solving a wide array of problems, however it also comes with its challenges. Which we discovered when diving in with our users:

  • With 90% of our users being builders and coders, requiring Python can easily exclude engineers that build with other languages.
  • Configuring systems to leverage GPU acceleration with Python can be tedious, Gretel should commit to helping you do this or make it automatic.
  • Larger, more complex workloads require infrastructure to scale. SDKs can't do this for you. We want to provide both the functionality and scale to our users.

Where we are going

With a ton of learning and data in our pocket, we have decided to take this feedback and release an updated version of the Gretel Beta, what we call Beta2, with these goals:

  • The Gretel Console will provide end-to-end functionality for privacy engineering as a service.  This will let anyone be a privacy engineer, not just developers. Automation and scale can be achieved by developing with our REST APIs and client libraries.
  • Focus on being purely API driven vs SDK driven.  The core workloads in our blueprints will be available to run via REST APIs hosted in Gretel Cloud.  Working with Gretel should be agnostic of language and framework. If you can make HTTP calls, you can do privacy engineering.
  • Gretel Cloud will be the control plane for running privacy engineering workloads. Users will have the flexibility to let Gretel autoscale CPU and GPU resources for you or link with Gretel Cloud to run workloads in your own environment.
  • Double down on open source.  Currently we provide open sourced features through our gretel-synthetics and gretel-client packages. As we build, we recognize more functionality we would like to make available.  To support this, we will be orienting around a single Open Source Python SDK that bundles all our open source functionality together.  This SDK will always be central to the services we offer. It will also always be free and open source. We call this Gretel Core and will provide more details as we get ready to transition our open source footprint.

As the components and functionality rolls out. We'll be communicating more directly about the specific feature and packaging changes that users can expect.  We anticipate having Beta2 available in the second half of this year. Like our current Beta, it will be free for all users, and your feedback is vital to building the best product we can.