Why Nonprofits Should Care About Synthetic Data
In November, I was on a panel discussing why synthetic data is important for nonprofits and how they can use it. The use cases are endless, spanning industries such as healthcare, education, economic development, and so much more. Much of the innovation hinges on both the technological innovations that are maturing and how those innovations address various pain points for nonprofits. I am extremely grateful that DataKind invited Gretel to talk about these issues with Medic at the NetHope Summit.
Synthetic data is recognized as one of the most important and exciting up and coming technology innovations. It will enable for-profit companies to share data, particularly when using differentially private synthetic data by removing the friction of privacy-related permission issues and allowing for more rapid technological development in pre-production pipelines. Moreover, synthetic data can be used to bolster datasets that are either too small or have dramatic dataset imbalances. This will help organizations build out more sophisticated machine learning capabilities that are usually very data hungry. However, these benefits do not only help the for-profit sector, but also can greatly impact the public sector.
I was thrilled to be invited by DataKind to speak on a panel about synthetic data with Medic at the NetHope Summit. It was the perfect opportunity to showcase the power of synthetic data and why it is important for nonprofits to start thinking about it, too. Each pain point experienced by for-profit organizations is amplified for nonprofits. Here are three main challenges and how synthetic data can address them:
(1) Protecting Data Privacy
Many nonprofits and NGOs deal with vulnerable populations, where privacy risks are heightened. These populations have faced technological abuse in the ways of surveillance systems, recidivism algorithms, and government welfare systems. Nonprofit organizations understand that this misuse can happen to the populations they serve, and thus must be very protective over whatever data they collect.
Differentially private synthetic data is one way nonprofits can mitigate this risk, unlocking several innovations and practices previously seen as too risky, such as collaborating on the fight against human trafficking (a use case shared by an audience member during our panel!). Although this doesn’t necessarily prevent imperfect uses of data for model development, it does protect the information about vulnerable populations from leaking and being used for other nefarious ends.
(2) Collecting & Leveraging Limited Data Sets
But even if we are able to share data to improve analyses, insights, and model building, the public sector just isn’t in the business of collecting data. The Law Family Commission on Civil Society, a research group dedicated to enhancing civic community bonds, published a report in October 2021 describing how the public sector is lagging behind the private sector because of poor data practices, such as using “unreadable formats” like PDFs to store data.
Collecting data is expensive and hard, and doing it right is even harder. Defining the granularity, the cadence, and the types of information collected is a full-time job unto itself. Few nonprofits have the budget or staff bandwidth to commit to robust internal data collection and management systems. Inevitably, other activities that are closer to the nonprofit’s core mission will almost always take priority over these difficult tasks, especially when the return on investment is hard to calculate.
Here’s where synthetic data can shine. If a nonprofit organization has collected just enough data (that amount is unknown, but less than a full data collection run), then they might be able to train a synthetic model to augment a relatively small sample. Because given a sufficient source of real data to work with, even just a little can then be transformed into an unlimited amount of synthetic data! With an expanded dataset and thus a more comprehensive understanding of their strategic initiatives and each one's performance, nonprofits can make more informed choices about where to allocate scarce resources so they can maximize their results.
A real-world example I discussed on the panel was a model DataKind built in collaboration with Microcred. The model would have helped Microcred make better decisions about lending, so it becomes more efficient and inclusive. However, Microcred was only collecting data on people who were granted a loan. Thus, the model could only be used on those who were granted loans, not on those who were denied. This limited the efficacy and impact of the model. Collecting data on those denied loans would take months or even years. But if Microcred utilizes synthetic data in the future, they would be able to collect data over a shorter time span and build out the rest of their dataset with better privacy guarantees.
(3) Sharing Data
One of the biggest benefits unlocked with privacy-protected data is the ability to safely share datasets. Turning siloed, private information into anonymized artifacts that teams can collaborate on can spark novel discussions and discoveries that lead to better results for the vulnerable populations being served.
Some civil servants in the UK have already begun advocating for the use of synthetic data to improve their government’s use of data. For instance, of the more than 200 ideas proposed during a recent civil service competition hosted by several public institutions synthetic data was voted one of the best, as a tool that could be utilized in the public sector for, among other things, “...improving detection of benefit and tax fraud through richer data exchange between the Department for Work and Pensions, HM Revenue and Customs and UK Visas and Immigration.” By giving public sector groups the power to collaborate with data, new insights can be found, gaps can be closed, and better systems for the populations served can be built.
The nonprofit sector is critical to the functioning of any healthy society. It helps ensure vulnerable populations with fewer means are cared for and the quality of life is improved for millions around the world. To continue providing these essential services to individuals and being a force for good on a broader scale, nonprofits must utilize data effectively. By using synthetic data tools like Gretel, they can break the bottleneck of privacy and unlock innovations that before seemed implausible.
The key now is to ensure that nonprofits everywhere know these highly effective, inexpensive (and in some cases free) tools are available to them today. I hope you’ll join me in spreading that good news!
You can get started with synthetic data for free.