Synthesize 2023: A Computer’s Vision for the Future of Computer Vision
Video description
A brushstroke of code on a palette of pixels: synthesizing images
More Videos
Read the blog post
Transcription
Speaker 1 (00:02):
With over 60% of our brain dedicated to vision related tasks, our eyes are the gateway to our reality. A vast majority of our lived experience is shaped by the images we perceive. These images play a fundamental role in our understanding of reality, a reality captured on canvas and through cameras, a photograph with the power to educate and inform.
(00:24)
As it turns out, most of what I know about the world is through these captured stills. The image becomes a teacher to the world beyond our immediate surroundings. And now with a brushstroke of code on a pallet of pixels, synthetic images offer the ability to peer into a new frontier of possibility. You can also just use it to make really fun pictures of yourself.
(00:50)
The field of image synthetics has a long and rich history with many results over the past decades. It may feel to us like things have happened all of a sudden, but the reality is one of steady progress in algorithmic insights and systems development.
(01:06)
The field started an unconditional image generation for digits where noise would be transformed into images. Much work was done to improve the quality, the compression, and the controllability of these generated assets. Gans dominated the field for years with many important improvements and insights. Recent work in the past year or two has really brought a convergence of the quality of generated images, the compositionally, and the ability to control generation.
(01:37)
Much of the work has been focused on new benchmarks and then improving performance on those benchmarks. This has resulted in a breadth of capabilities that we see here, and these capabilities often are showed through stunning artworks.
(01:50)
Today I would like to showcase the utility of image synthetics and see what happens if we go beyond the art to industry applications of generative images. I have two use cases that I'd like to highlight today based on the image synthetic work we've been doing internally.
(02:06)
The first is a smart agriculture example that we have on our blog, and the second is a brief look at conditional image generation for a physical security example.
(02:17)
So to set the stage, let's imagine that we have a robotic tractor. This tractor has a full actuation pipeline complete with recognition and decision making capabilities. The recognition portion is powered by a deep neural network called a resnet. It has extremely high reliability. It can tell the difference between undesirable weeds and a budding corn plant. This resnet as a neural network performs the images segmentation and classification to determine what is in the image and where it is in that image.
(02:53)
However, the world is ever-changing, and in this case, we may experience some domain shift.
(03:00)
For example, what if we have an unseasonably cold day accompanied by a snowstorm? This is unexpected and was not at all part of the original training pipeline. So naturally performance drops, and the drop brings the entire operation to a halt.
(03:16)
In traditional pipelines, we would then collect a bunch of snowy field data, send it to our team of contractors for labeling, retrain the system, and redeploy. This is quite expensive and potentially results in the loss of our entire crop.
(03:31)
However, with synthetic data, we can reduce the time between discovery of domain shift and a newly deployed system. To do so, we would use an image synthetics model that is fine-tuned on the same classification we already have for the recognition system. And then we simply ask for the domain shift that we need. I want to point out to you these images are not real. These are synthetic versions of the crops that I've asked to be snowy. Every image actually in this presentation you've seen so far has been synthetically generated.
(04:04)
And so after the generation process is done, we can take the original recognition system, retrain it on the newly generated images, and then deploy the resnet quickly to our smart agricultural system.
(04:17)
Our original system had high accuracy, but it dropped significantly when the snow came. Now, with the new sign synthetic images we are able to recover the original accuracy in the new domain.
(04:30)
When we tested this in-house, we were able to go from the 15 to 20% drop you see, to recovering the original system accuracy by plugging the hole with synthetic data. So here we have our synthetic friend who's here to plug the hole when we experience domain shift.
(04:48)
The second use case that we'll explore today is not one of domain shift but of service augmentation. By using a synthetic image generation system in pseudo realtime with your existing product offering, you can provide additional services that were previously impossible, certainly unavailable to our customers.
(05:06)
So to make this concrete, when traveling on an airplane we are required to go through security. There are a few parts to that process. We'll walk through as a person to some metal detector and our luggage will be passed through a scanner. The scanner uses different types of x-ray technology to generate a transparent image of the innards of the bag. This is used to detect potential contraband.
(05:29)
So for example, if we run this synthetic bag through the scanner, we'll see shoes, glasses, maybe some hygiene supplies. A different bag through the scanner generates a different image based on the contents of the bag.
(05:43)
One difficulty with this process is that items may not be recognizable from the single view. We might want to look at them from another angle or a different perspective. So there are a few options here. We could install a second sensor to get a new angle. We could run the bag through multiple different times, rotating it each time. These are fine solutions, but we can do better with synthetic image generation.
(06:07)
So in this case, we can train the synthetic image model on multiple views and use it to conditionally generate what another view would look like from the first view.
(06:19)
In these six images, we have one view on the left, which comes from the scanner, and the second view, which is synthetically generated. So one row here of three images is ground truth and one row is synthetically generated.
(06:34)
So as a fun exercise, take a moment to see if you can tell which row is the ground truth and which is synthetically generated. It's helpful to try and look for objects in the first image that appear in the second.
(06:48)
So in this case, the bottom row is synthetically generated and the top rows are ground truth. This is a very exciting development of service augmentation that can be used and deployed for real enterprise value.
(07:03)
There's a lot to be done, however, to ensure that our generations are correct. We have to take special care on a few things. Outside of initial data curation, we have to run validation on the generations to decide which images to use and which images not to use.
(07:19)
This is challenging to do in a low resource environment with only a few dozen input images. But with recent work on zero shot image classification we have an avenue. The methods listed here get close to 80% zero shot image net classification performance, which is sufficient to use as a filter and another layer of control to help ensure high quality outputs.
(07:44)
It's also important to realize that even in the use cases we've shown, we understand that these models can be somewhat sensitive to the prompts used in generation.
(07:53)
Today, all of this requires professionals to be able to interpret the results of the intermediate generations. Additionally, as a consumer of the synthetic images, your enterprise will have to have a measure to determine quality of downstream systems, such as the classification accuracy used in our smart agriculture example.
(08:13)
So if you're interested into diving into image synthetics yourself, feel free to visit labs.gretel.ai/image where we have fun tools for you to play around with. We'd also love to see what you make, so please join us at gretelai.discord.
(08:31)
We love interacting with the synthetic data community and would love to see what you can build with these tools. The future in generative AI and synthetic data is extremely exciting, with new capabilities emerging every day in text, tabular, and images. We at Gretel are excited to enable you with privacy preserving technologies to be the brushstroke of code on the pallet of pixels.
(08:55)
Thank you, and I'll take any questions.