Diffusion models for document synthesis

Explore state-of-the-art image synthetics for business documents using diffusion models.
Copyright © 2022 Gretel.ai
Copyright © 2022 Gretel.ai

Here at Gretel, we spend a lot of time thinking about how to better generate synthetic data. Usually, that data comes as tabular or text data. Most data in the business world is either in a relational database, excel file, CSV, or stored as text. However, there is an increasing demand for synthetic data of other forms than just tabular data. 

Move over GANs, Diffusion Models are here

One form of data that is quite interesting is images. When compared with the long history of machine learning, generative image applications are quite new. In the past decade we have seen a wide variety of methods to generate images including Auto Encoders and Generative Adversarial Networks (GANs). 

For the years since their inception, GANs have held the title of best generative models for images as measured by diversity and quality. However, even though they have produced amazing results there are still a few problems. One of the main issues is often referred to as “mode collapse” or “low mode coverage” wherein the model has trouble generating a wide variety of images. This means that the images may be high quality but low in diversity. 

‍Recently, Diffusion Models have arrived as a potential replacement for GANs since they are able to generate high-quality images with great diversity. If you want to dive in and learn more about Diffusion Models, here are some in-depth posts:

https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction/ 

Diffusion Models have some similarities and some differences to GANs. In a GAN, there are two neural networks that compete against each other, hence the term “adversarial”. In Diffusion Models, however, there is only one neural network involved. This network is trained on diffused data. This means that a “forward” noising process is run on the data that takes it from clean to noisy, like so:

Forward noising process of Gretel Logo

Then, the neural network (typically a U-Net) is trained to reverse the forward noising process. This approach has a number of really nice properties, primarily that it performs well on a wide range of data types. One of the most recent successes of Diffusion Models is DALL·E 2 from OpenAI. They use Diffusion Models and text-based models to create fantastic images. 

This same underlying technology can be used for important business applications as well. For example, here at Gretel we recently explored how well Diffusion Models perform at document synthesis.

Document Synthesis

In a company’s transition to digital, they often catalog physical documents as digital scans. This results in images of documents that are stored in the cloud or on internal systems. Oftentimes, we want to analyze these images to make decisions or train downstream machine learning systems. For example, a doctor may want to scan and upload their handwritten notes to include with other patients’ electronic health records for further analysis. However, because of the private nature of these documents, or the expense of collecting and scanning - we often need synthetic images of documents for our downstream tasks. 

Enter Document Synthesis as a potential solution. Here, we can synthesize realistic images of documents using either GANs or Diffusion Models. Since we may only have a few original images to work with, our methods need to efficiently use the data we have collected. 

Here we use a small 300 image dataset of receipts as our testbed. The goal is to generate realistic-looking receipts that could be useful to an organization.

GANS

We first explored whether “small data” GANs would be able to synthesize receipts. We had moderate success. As you can see here, the model was able to learn the basic shape of the documents and the vague squiggles of text. However, these are likely not useable for any business purpose, since they remain illegible. 

Receipts generated from GAN model

Diffusion Models

However, when we transition to using a Diffusion Model, we immediately see an increase in quality. This model could quickly be scaled which would improve performance. Notice here how the model has not only captured shapes, lighting, and variations - but it also has been able to write coherent text.

Receipts generated from the Diffusion Model

Conclusion

On the applied science team here at Gretel, we explored whether Diffusion Models could be used to generate documents. We found the capabilities of these models to be really promising! If this is exciting to you, feel free to reach out at hi@gretel.ai or join and share your ideas in our Slack community.