The Gretel Blog
Learn more about Privacy Engineering from Gretel experts – engineers, data scientists and our AI research team.
Search results for "privacy"
Test Data Generation: Uses, Benefits, and Tips
Test data generation is the process of creating new data that replicates an original dataset. Here’s how developers and data engineers use it.
Create Synthetic Time-series Data with DoppelGANger and PyTorch
Generate synthetic time series data with Gretel.ai’s open-source PyTorch implementation of DoppelGANger.
Red Teaming Synthetic Data Models
How we implemented a practical attack on a synthetic data model to validate its ability to protect sensitive information under different parameter settings.
Conditional Text Generation by Fine Tuning Gretel GPT
Augment machine learning datasets with synthetically generated text and labels using an open-source implementation of GPT-3.
Diffusion models for document synthesis
Explore state-of-the-art image synthetics for business documents using diffusion models.
What is Model Soup?
A brief exploration of model soup, the new ensembling technique that takes the average weights of multiple models to improve overall performance.
Transforms and Synthetics on Relational Databases
A walkthrough of our new multi-table transform and multi-table synthetics notebooks, which can be used independently or simultaneously.
ML Models: Understanding the Fundamentals
Machine learning models can be trained to recognize patterns in datasets. By utilizing algorithms, they can learn to make decisions based on these patterns.
Transforms and Multi-Table Relational Databases
How to de-identify a relational database for demo or pre-production testing environments while keeping the referential integrity of primary and foreign keys intact.
What is Data Anonymization?
Everything you need to know about anonymizing data and the techniques for mitigating privacy risks.
Simplifying Our APIs
Five new features that will make synthesizing data easier for busy developers and data scientists.
How to Generate Synthetic Data: Tools and Techniques to Create Interchangeable Datasets
Synthetic data is algorithmically generated data that mirrors the statistical properties of the dataset it’s based on. Learn how to make high-quality synthetic data.
What is Synthetic Data?
Synthetic data is artificially annotated information that is generated by computer algorithms or simulations, commonly used as an alternative to real-world data.
Q&A Series: Solving Privacy Problems with Synthetic Data
Answers to some questions about synthetic data that audience members submitted during Gretel's talk at The Rise of Privacy Tech’s Data Privacy Week 2022 conference.
Create a Location Generator GAN
How to train a FastCUT GAN on public location data from a few cities to predict realistic e-bike locations across the world.
How to use Weights & Biases with Gretel.ai
How to use Weights & Biases’ ML hyperparameter sweeps tool to optimize the accuracy of your synthetic data.
Data Is More Valuable When It Can Be Shared
Today, we are thrilled to announce the general availability of Gretel's privacy engineering APIs and services.
What We’re Reading: Trends & Takeaways from the NeurIPS 2021 Conference
The Gretel research team's favorite trends and takeaways from the NeurlPS 35th Annual Conference on Neural Information Processing Systems.
Creating Synthetic Time Series Data for Global Financial Institutions – a POC Deep Dive
How we generated high-quality synthetic time-series data for one of the largest financial institutions in the world.
Advanced Data Privacy: Gretel Privacy Filters and ML Accuracy
A look at how using Gretel’s Privacy Filters to immunize synthetic datasets against adversarial attacks can impact machine learning accuracy.
Why Nonprofits Should Care About Synthetic Data
How synthetic data can help nonprofits improve their business operations and their impact on the people they serve.
Gretel.ai + Illumina - Using AI to create safe, synthetic datasets for genomics
Promising evidence that state-of-the-art synthetic data models can produce artificial versions of even highly dimensional and complex genomic and phenotypic data.
Optuna Your Model Hyperparameters
We explore the popular open-source package Optuna to demonstrate how you can optimize your model hyperparameters and build the best synthetic model possible.
Common misconceptions about differential privacy
This article clarifies some common misconceptions about differential privacy and what it guarantees.
Veterans Day Reflections: Open source software and evacuation operations, a remarkable combination.
Quickly and safely aggregate geolocation data for location density analysis using a hexagonal grid system.
Got text? Use Named Entity Recognition (NER) to label PII in your data
Use Gretel’s NLP setting to label PII including people names and geographic locations in free text.
Workshop: Generating Synthetic Data for Healthcare & Life Sciences
How to enable faster access to data for medical research with statistically accurate, equitable and private synthetic datasets.
Why privacy by design matters more than ever
Today we announced that Gretel raised $50 million in funding to help us advance our mission to bring “privacy by design” to all developers.
Exploring NLP Part 2: A New Way to Measure the Quality of Synthetic Text
By merging breakthrough research on text metrics with new types of embeddings, we produce a reliable metric that is highly correlated with human ratings.
Exploring NLP Part 1: Why Should a Privacy Engineering Company Care About NLP?
There is a lot of hype around NLP. In this post, we explore some of the criticisms and how you can use this technology responsibly.
Introducing Gretel's Privacy Filters
Create synthetic data that’s safer than ever. Our simple configuration file settings enable you to secure both your data and model from adversarial attacks.
Instrumenting Kubernetes in AWS with Terraform and FluentBit
In this blog, we will use Fluent Bit to collect logs from AWS EKS cluster applications.
Build a synthetic data pipeline using Gretel and Apache Airflow
In this blog post, we build an ETL pipeline that generates synthetic data from a PostgreSQL database using Gretel’s Synthetic Data APIs and Apache Airflow.
What's new in Beta2
Beta2 for Gretel.ai is all about delivering privacy engineering as a service through clean, simple APIs.
What is Privacy Engineering?
In this post, we will dive into what privacy engineering is, why it’s important, and some of the core use cases we are seeing that are enabled by privacy.
A guide to load (almost) anything into a DataFrame
Pandas provides so many options of reading data into a DataFrame, here's our short guide to ones that we found most useful.
Synthetic Data Configuration Templates
Our new configuration templates will help you pick some of the right parameters needed to train your synthetic data models.
Practical Privacy with Synthetic Data
Implementing a practical attack to measure un-intended memorization in synthetic data models.
Introducing the Gretel Bartender
A game-changing AI that will disrupt the cocktail industry and spin the world on its head.
Anonymize Data with S3 Object Lambda
Anonymize data at access time with Gretel and Amazon S3 Object Lambda.
How accurate is my synthetic data?
Gretel’s new synthetic report is here, featuring a high-level score and metrics to help you assess the quality of your synthetic data.
Gretel Smart-Seeding is auto-complete for your data
Smart-seeding lets you train a synthetic data model to auto-complete partial records and text.
Machine Learning Accuracy Using Synthetic Data
Can synthetic data really be used in machine learning? We explore the utility of synthetic data created from popular datasets and tested on popular ML algorithms.
Here's what we learned about privacy engineering from 50+ companies and hundreds of developers.
Creating synthetic time series data
A step-by-step guide to creating high quality synthetic time-series datasets with Python.
Walkthrough: Create Synthetic Data from any DataFrame or CSV
Train an AI model to create an anonymized version of your dataset using Python, Pandas, and gretel-synthetics.
Recognizing Data Privacy Day by Protecting Your Privacy
What if we could ensure that personal data was protected, benefiting not just the individual but also giving developers faster, worry-free access to data?
Install TensorFlow with CUDA, cDNN, and GPU Support in 4 Easy Steps
Set up a cutting edge environment for deep learning with TensorFlow 2.4 and GPU support.
Automate Detecting Sensitive Personally Identifiable Information (PII)
Use Gretel.ai's APIs to continuously detect and protect sensitive data including credit cards, credentials, names, and addresses.
Automatically Reducing AI Bias With Synthetic Data
Create a fair, balanced, privacy preserving version of the 1994 US Census dataset using gretel-synthetics.
How To Create Differentially Private Synthetic Data
A practical guide to creating differentially private, synthetic data with Python and TensorFlow.
Gretel.ai Raises $12 Million in Series A to Safely Share, Build with Data
We are pleased to share that Gretel raised $12M in Series A funding. We're picking up strong momentum in our mission to help developers create safe data.
Load NER data into Elasticsearch
Create a simple workflow to perform Named Entity Recognition (NER) on sample data using Gretel and load the records into Elasticsearch.
November 2020 - What’s new in Gretel
We are releasing new features that make working with data easier by helping you deep dive into records, use blueprints to auto-anonymize data, and more.
Auto-anonymize production datasets for development
In this post, we walk through building a data pipeline that will automatically transform datasets so they can be safely used in development environments.
Introducing Gretel Blueprints
We are launching Gretel Blueprints, making it easy to anonymize and balance datasets with just a few clicks.
Gretel's New Synthetic Performance Report
Gretel's Premium SDK now includes detailed reporting that shows you how accurate your synthetic data's statistical distributions and correlations are.
Create high quality synthetic data in your cloud with Gretel.ai and Python
Create differentially private, synthetic versions of datasets and meet compliance requirements to keep sensitive data within your approved environment.
How to use Gretel’s new entity stream
We recently launched our new entity stream view in Gretel Cloud. See how you can view record streams from tagged entities in your data projects.
Gretel Synthetics Frequently Asked Questions (FAQs)
Build differentially private synthetic datasets in Python.
NEW: Integrating with Gretel SDKs just got easier!
Learn how we are improving our product by adding new features that make connecting to Gretel easier, faster and more streamlined.
Improving massively imbalanced datasets in machine learning with synthetic data
Use synthetic data to improve model accuracy for fraud, cyber security, or any classification task with an extremely limited minority class.
Reducing AI bias with Synthetic data
Generate artificial records to balance biased datasets and improve overall model accuracy.
Gretel Synthetics: Introducing v0.10.0
Explore how to create a batch interface with the latest version of Gretel Synthetics on Google Colaboratory.
Automated Data Exposure Detection with Gretel Outpost
Gretel Outpost is a free integration architecture that automates the steps that a security team would take in assessing the risk or exposure to data.
Contact Tracing: Deep Dive & Simulation
We decided to examine the privacy preserving capabilities of the Contact Tracing proposal, how it would be implemented, and what privacy concerns exist.
Create artificial data with Gretel Synthetics and Google Colaboratory
Use Gretel Synthetics and Colaboratory’s free GPUs to train a model to automatically generate fake, anonymized data with differential privacy guarantees.
Fast data cataloging of streaming data for fun and privacy
Learn more about how Gretel's REST APIs automatically build a metastore that makes it easy to understand what is inside of your data.
Using generative, differentially-private models to build privacy-enhancing, synthetic datasets from real data.
We’re going to train and build our synthetic dataset off of a real-time public feed of e-bike ride-share data called the GBFS (General Bike-share Feed)
At Gretel, we realized that we can apply machine learning, synthetic data, and formal reasoning to offer provable privacy guarantees for data.
Deep dive on generating synthetic data for Healthcare
Take a deep dive on training Gretel’s open-source, synthetic data library to generate electronic health records that protect individual privacy (PII).
Innovating With FastText and Table Headers
Look at how FastText word embeddings can help to quickly understand new datasets, and build more consistent labels for your own data.