Automated Data Exposure Detection with Gretel Outpost

Gretel Outpost is a free integration architecture that automates the steps that a security team would take in assessing the risk or exposure to data.

On August 25th Gretel will launch its free Public Beta and we are super excited to enable developers to create safe data with our cloud service and developer SDKs. As we build, we constantly find new use cases for our technology.

Previously, while testing our own service, we discovered unintentional PII in a popular Kaggle dataset. Since then, we’ve continued to observe the multiple data leaks in the news and have had several discussions with enterprise security and privacy teams, who reached out to see how our automated data labeling could minimize the risk of accidental data exposure.

After some great conversations and some weekend coding sessions we are happy to announce Gretel Outpost.

Outpost is a free integration architecture that ingests alerts from common security tools, executes data collection surveys, and sends that data to Gretel to quickly identify the exposure of PII and sensitive data. In a sense, Outpost automates the steps that a security team will take in assessing the risk or exposure to data associated with the security alert.


Outpost has two components that work hand-in-hand:

  • Consumers: These modules receive alerts from security tools and parse the alerts to create a survey request message. We currently support Shodan and will be adding more modules that provide support for GitHub, BitBucket, GitLab, Amazon S3, and custom web hooks to support any system that can send HTTP requests!
  • Surveys: These modules receive survey requests and depending on the request type, will attempt to gather data at a more granular level and extract record-level information. For each successful survey, a new Gretel Project will be created and samples of the data will be sent to the project, enabling security teams to easily share granular access of the discovery to data owners during investigations. Gretel automatically labels sensitive data types including names, addresses, access keys using Gretel’s Natural Language Processing (NLP) and provides you a high level report of what it finds.

The diagram below illustrates possible deployment modes for Outpost using the Shodan consumer:

Image for post
Example Deployment with Gretel Outpost


Example Flow

One example deployment would be using (a search engine that scans the Internet) alerts to identify publicly accessible ElasticSearch systems that reside within your organization’s public IP ranges. First, you would configure Shodan to monitor IP ranges and/or domains that belong to your organization. Second, start Outpost with the Shodan consumer enabled.

Whenever Shodan yields an alert that indicates a possible exposed ElasticSearch system, a survey request will be generated and the ElasticSearch survey module will attempt to extract sample documents and send them to a new Gretel Project. We will sample documents from all possible indices, which provides maximum coverage on all possible data that is exposed.

Additionally, the raw Shodan alerts are also consumed, processed for data labeling, and made available in its own Gretel Project. This provides a quick, safe way to view everything Shodan knows about your infrastructure.

Survey data shows up as new projects in your project list and contain a short description of the survey that created the project:

Image for post
Gretel Project Listing with Outpost Survey Results


Now you may invite others to look at the data or use our SDKs to anonymize and share the findings with others. The records that are sent to Gretel are encrypted in a memory-only stream and we only retain the most recent 50,000 records to enable raw data analysis.

Metadata about the records, however, will be kept for the duration you keep the project active and can accumulate metrics well beyond the cached 50,000 records. One particular report that is useful when analyzing sampled data from Outpost is the entity report, which shows high level statistics on what types of data we saw across the entire ingest stream:

Image for post

How to get started with Outpost

Outpost is a free community project, however due to the sensitive nature of its capabilities we have chosen not to open source it at this time. We welcome any team member of a security or privacy organization that has a need to mitigate exposure risk to request access. We will add you to our GitHub team, which provides access to the code and deployment guides.

Outpost is available today and will be usable with our public beta that launches August 25th!

Please contact us at if you are interested in joining the GitHub team to get access to Outpost.