Synthesize 2023: Accelerating 3D Synthetic Data Generation for Perception
Video description
A deep dive on NVIDIA's 3D synthetic data generation platform Omniverse Replicator
More Videos
Read the blog post
Transcription
Speaker 1 (01:11):
Hello everyone. Thank you so much for joining me today at Synthesis. I am going to be speaking about Omniverse Replicator, which is NVIDIA synthetic data generation platform. We generate synthetic data, particularly for perception networks. You might be hearing a lot here about text, but let's talk about computer vision. Before I begin, I want to show you an example of our synthetic data is being used in the wild. So here on the left you have a digital twin of an Amazon warehouse, and on the right you have the actual Amazon warehouse. So in this case, we build a 3D world to represent this warehouse and then we utilize this world to generate synthetic data for the task of detecting the boxes. Turns out that they changed the tape towards a more reflective tape and then the computer vision algorithms were not working anymore.
(02:14)
So for that we utilize Omniverse Replicator to generate synthetic data of this tape with more reflectivity and we retrained the network and this was feasible to do in minutes with the capabilities of Replicator and with their digital twin. You don't need a full digital twin to use simulation for synthetic data, but it is pretty awesome that that digital twin, which has the purpose of setting up a 3D world and testing the warehouse logistics had also the ability of enabling AI for this use case. In today's agenda, we are going to first talk about the shortcomings of real data. Then what is synthetic data? What are the core components of the synthetic data generation solution? Then we'll dive into Replicator, which is the API that we have to generate synthetic data and then we'll give you resources so that you can get it started.
(03:11)
A typical process for training a deep neural network for perception is quite simple. You collect data, you annotate it, you convert it to the right format and then train the DNN. And where we're ready. And it is simple as it sounds, well not quite. So you might be able to do this very simply for tasks like cat detection because the internet loves cats, but that is not the case for tasks like manufacturing or logistics where the data sets are quite limited. Putting the cameras as to collect that data can be very difficult. And on top of that you need to annotate it and the annotations get even more parallels when we are talking about semantic segmentation. So in this case, each pixel has to be labeled one by one, and even though there are techniques to automate this process, it's not great. It still takes quite a bit of time and the costs start to adapt and things get even worse when we're talking about lidar point clouds.
(04:11)
So generating synthetic data for point clouds can be a great solution there. In the case of a LIDAR point cloud, if you're trying to label an actual point cloud for segmentation tasks, it takes about half a day to get it labeled. And what we found from actually training DNNs with that data is that it tends to be actually mislabeled too because it's quite difficult to label each of the pixels from the lidar. So this is a big reason why synthetic data can be handy because it can generate this kind of data. But additionally it is helpful in areas like long tail anomalies. So you might have a [inaudible 00:04:53] that's driving on the road, collecting data for you and you might have terabytes of data but you don't have car crashes hopefully, or you don't have a deer in the road at dawn with glare. And these are the things that we need to, for instance, generate.
(05:12)
The non-visual sensors as I mentioned, lidar is very hard to label by hand. And then you have same problems for radar and all that kinds of sensors. In the case of occlusion, so you might be doing a tracking computer vision algorithm. If you lose the item on site and then you get it back to visible over a period of time that might have lost the item and it might have gotten a different label. And this is really bad for training the AI as they will then no longer be good at tracking, especially included objects. And then indirect features so in simulated worlds we can get things like the cars are moving with respect to each other at what speed exactly and store all of that information.
(06:09)
Even more unique tasks that can only be done actually with synthetic data. So here we're talking about pose estimation. On the left we have a robotic arm that was trained to detect the pose of that valve that you see in the background. For detecting pose, you are talking about the position of the object in space and how rotated it is. That data is impossible to get from the real world as you might need to have some kind of tracker or potentially there are ways of collecting that data by computing QR codes on the item you want to detect. However, you're altering the real world pictures or the real world data that you would get anyway, which is not what you want. And then on the right we have here a robotic manipulator that is actually doing reinforcement learning. So in that case, you need to actuate, send the data, get the data from the DNN back, make a change in the simulated space.
(07:16)
And I sure hope you're not using your super expensive robots to do that kind of alternative loop because that would be very expensive with the policy that's started at random. Then we have manually generated data sets that you might have gotten a one time, but even then it will be very challenging for tasks like in assembly lines, which are constantly reconfigured. Imagine the robot that we saw before. Now instead of having to pick up the bulb, it has to pick up something else. Now you need to detect a different kind of defect. So that is very easy to just regenerate the data set for the new item. And in case of robotics warehouses, those change constantly. They go from having a million boxes to syrup and for instance in different times of the year. And then when we are talking about retail, the packaging changes constantly and there are new items being put in the line all of the time. And that is much easier to incorporate to your already simulated environment than it would be to go and gather 1,000s and 1,000s of image every time there is a product change.
(08:36)
So to recap, the case for synthetic data is you have insufficient data, you might have issues with dataset that you are trying to get. For instance, if we're talking about diversity, we're talking about digital humans there, it is easier to parametrize like the diversity of this dataset for instance. When we're talking also a lot about additional scenarios. So for instance, we have rare events in which we really cannot capture them in the real world. And lastly, it is simply expensive to bootstrap this, to get your real data sets to begin with. And especially if you change your mind. Like for instance in some projects I worked on as a Edge Solutions Architect, there were changes to the needs of business which required then changes to that tasks that we were performing in computer evaluation. And that led to us having to reglet a dataset which can be very costly if we're talking about real datasets.
(09:44)
So why is synthetic data? I've told you that synthetic data somehow solves all of these problems, but what is it exactly? Synthetic data is data that is generated by special temporal simulations or algorithms as a alternative to real data. In particular today we're talking about simulations that generate the synthetic data. In the picture here we have an example of Isaac Replicator, which is the synthetic data generation for robotics use cases. And here we have our warehouse that is accessible actually to Isaac sim where we have everything perfectly labeled and we are showing you the synthetic labels for all of the assets in these digital twin of our warehouse. So here we have bounding boxes, segmentation, depth maps, and more. So synthetic that is able to provide you that kind of data immediately, once you have this digital twin or to begin with a digital asset. It is able to generate high diversity because in this digital world you can control the lights, you can control the assets that are in it and you can regenerate as needed.
(11:12)
And lastly, synthetic data is not just to be used by itself. Most of the time for the projects that we do, we strongly recommend getting a very good test set that you can use to test your DNNs that have been created with synthetic data. And then on top of that, sometimes what we recommend is you go 80% of the way with synthetic data and then you do the fine-tuning on real data. It's still reducing heavily your costs and it's still proving the validation of the synthetic data of the DNN in this case rather than going and directly connecting real data. Synthetic data as well creates data humans cannot label, as I mentioned before, we can generate rare events, we can bootstrap, we can generate that data for six degrees of freedom. We can generate data for reinforcement, learning algorithms, non-visual sensors, and much more.
(12:14)
So I tell you, synthetic that is a solution to all of your problems. But is it perfect? Does it work immediately off the shelf? Well, the reality is that there is something called the domain gap. The domain gap shows the appearance gap, which is the pixel level differences between one frame and the other as we can see here. And the content gap as much as you would like to have... In the real world you have 10,000 trees, in the synthetic world, you're maybe going to have 10 or so. So that is the content versus appearance gap and that is something to watch out for when using synthetic data. However, within Omniverse we overcome this with the appearance gap with high fidelity, 3D world simulation, physically based material, so MDL. If you go and look into the Omniverse MDL library, you're going to find materials for metals, wood, and so on.
(13:22)
We have validated sensors. So for instance, we tested that they are similar to the real world and we enable you to get, for instance, granular control over the camera lens simulation, camera lens distortion and things like that. We then have multi-sensor support, so lidar, camera and radar if you're using the autonomous vehicle simulator. Then we have the content gap. For the content gap, we are making it so that it's built on this open standard USD. And then we are building connectors such that you can bring in content from all kinds of treaty tools. Omniverse has connectors to real engine, Maya, Max, Easy Catch Up and so on. So because of those connectors, now you can have access to so much more data than before 3D data and you can bridge that gap. Additionally, we provide you a library that enables you to randomize and modify. We will talk in a bit.
(14:35)
And what we do as well is this thing called domain randomization. So in domain randomization what we have is we are trying to randomize the synthetic world as much as possible such that the synthetic domain is large enough to encapsulate the real domain. And the way that we do that is by modifying lights, modifying textures, modifying materials and so on to bridge that reality gap. And basically what it achieves to randomize these environments is that we make it so that DNN is forced to learn the transferable characteristics of the synthetic domain to the real domain. And a great example of doing this is for instance, randomizing the textures in the synthetic domain. This has a great impact on the performance of the DNN in synthetic versus real because DNNs specifically convolutional neural networks love learning the texture of a frame.
(15:49)
And so if you randomize the textures, then you force it not to learn textures. And this was actually a big problem with even normal data, we didn't mention it, it looked like that a lot of the DNNs had just learned the texture and not actually the transferable characteristics from characteristics that made the DNNs more generalizable in other domains. So let's talk about the synthetic data generation solution and how we approach generating synthetic data and how we get to the point where we can do domain randomizations and generate data. So how do you go about generating synthetic data in a simulated environment? So the first thing that you do is you create a 3D asset or you ingest it from another tool. So the way that you could do this is, for instance, you find it in a marketplace that sells 3D assets or you incorporate it from an open source data set and you bring it in or you could do some artistry yourself to bring it in.
(17:00)
Then you generate a scene. Scene generation in this case is complexor, we're talking a full-blown digital twin of our warehouse. But for instance, in that case that I show you of the robot that was grabbing the bulb back and forth, that thing that we needed there was just the bulb. That's it. And then we generated database on the bulb and using domain randomization techniques, then we need procedural generation. So here is how you take in that scene or that asset and then you incorporate, for instance, multiple distractors or multiple assets in the scene that are moving in a certain way or you change the light conditions or you change the textures and so on. So that would be what you would do with the Replicator script and then you would send it to generating batch in a server or you could do this in your workstation if it's a simple task. So for instance, if I'm running in my laptop and I have a very simple scene, I can generate quickly 10,000 images in maybe under 10 minutes.
(18:13)
But if I am an autonomous vehicle developer, then I would probably need to use a data center that has access to multiple GPUs to generate 4K data for each camera and the lidars. And how would it be deployed? So it would go directly to train DNNs, the synthetic data, or it could even be used for product design. So for instance, you are building a VR headset, you are trying to identify where do I need to put the sensors in this device to gather the right data. Or you are building something like a forklift and you want to know where do I get the highest visibility and then I get good DNN that I can train. So in product design that's very useful. And then in SIL and HIL testing, so this would be for an autonomous robot in which you need to test the software that you're building in a simulated environment and you need to then test it, not only the software but also the hardware, not the vehicle itself but like the chip itself against the simulated environment that you're running.
(19:34)
And that is where synthetic data is very, very helpful. So Replicator is the synthetic data generation tool that we provide with Omniverse, but there are domain specific simulators that we have that are for autonomous vehicles, robotics, and then replicate our core, which enables you to extend it to your use cases. The first one that we are going to talk about is Isaac Replicator here within Omniverse you get access to assets like this ones for making a warehouse. And then you can procedurally generate data for that warehouse and hopefully that will be useful to you to deploy a robot in a real warehouse. We have drive Replicator. For drive Replicator because this is our autonomous vehicle simulator. Here you'll have assets that are more pertinent to cars, but also cities and potentially the vegetation and animals.
(20:44)
And then you'll have certain maps that are from these cities and you'll be able to generate a scene. Then you procedurally generate data from that scene. And here you can see data from a city that you can then use to train DNNs or most importantly in the case of autonomous vehicles is test the edge cases. So you might still be testing on real, but you need to have all of these edge cases to test. Lastly, we have custom Replicator, which is basically taking Replicator or a modifying into your end use case. And here we have an example of how you could bring in a CAD file, which is very common for instance in manufacturing. Then you bring it into Omniverse, you can add defects and scratches and so on and procedurally generate the data to then train an AI to detect defects. So that is one example, but for instance, the first example in which we had the valve that was getting picked up, we just departed a cut drying and then moved it around the machine.
(21:56)
But basically that is the workflows for generating Synthetic data. And the Replicators API leave here. So after you getting your content and you've made part of your scene or like the baseline scene, you can then generate tons of data with a core Replicator APIs, procedurally generating and rendering lots of data for all of your use cases for perception. Replicator leverages the key Omniverse platform capabilities as I mentioned beforehand that it builds on an openness standard, which is USC, and then from that standard there is tons of content already there, but then you can bring in content from other 3D platforms and utilize it within Omniverse. So this bridges your content needed. It is photo real and physically accurate and it's really fast. So you'll get the feedback loop from lights and sensors very quickly. It is highly scalable. So for instance, if you want to deploy multiple containers on the cloud, you can to generate like a million frames quickly.
(23:10)
I think we had an experiment where we regenerated over a million frames for warehouse in under half an hour and expands to multiple domains. So given the flexibility of bringing in assets from multiple tools, we now can generate synthetic data for all of those applications. Here, one thing to note is that Omniverse is built on top of the core capabilities of NVIDIA, which is KUDA RTX built with USD to then determine the assets. Then you can extend it and modify it to your own use case. And of course toy detection is key use case. But key features of Omniverse will be the ability to import CAD, Houdini, Blender. Physics and material definitions, domain randomization, annotations, custom writers, which I'll touch on in a bit. Non-visual sensors, multi GPU and multi node support and cloud native. So let's talk about the core components of Replicator. Replicator has the randomizer which enable you to utilize that data that you brought from other 3D tools and then randomize it within the environment as you can see here. So here you see the assets moving around in a scene.
(24:52)
Here we have again the toy example, but this is extremely useful because given the assets that you got, you can get infinite frames for whatever task you want and we have a wide set of randomizer that enable you to procedurally move things around in a scene, in such a way that looks realistic to your end use case. Below the Replicator, there is omni synthetic data, which gives you access to the RTX renderer and gives you access to the arbitrary output variables of the renderer, which are the ones that enable you to get the synthetic data and it gives you access to the annotator omni graphs, which is how we annotate the synthetic data from the renderer itself, as you see here. For more high level and easy to develop access, we have the bounding boxes, segmentations, and the typical annotated data in the Replicator annotators. So that way we have the bounding boxes and the segmentation data perfectly set up and you can go ahead and get it started with it as you see here.
(26:19)
One cool thing about this is that you only click on the assets. You start with a semantic label and then you have the labels forever. And lastly, we have the writers, we have the annotator output, then you can pass onto a custom writer and then produce the labels in whichever way you want. The reason this matters is because we understand that everyone has different data requirements, so some people might need it in kitty, cocoa or some other format, and because of that, we make it so that you can come in, modify the writer and write something that will be usable for your own use case. To summarize, we have builtin randomizer that do material background and lighting randomization. We have builtin annotators, we have builtin writers for kitty and others.
(27:25)
So the end-to-end process would be to utilize Omniverse Replicator to generate the synthetic data, then train the perception DNN. And for that, once you're done with Omniverse Replicator and video provides, you retrain networks to then train with that synthetic data. It provides you withal, which is a training framework to train these DNNs. And then we have products for the end-to-end of this process. So you have trained the network, again, you want to deploy it into a video stream in manufacturing line, you could come in and utilize like NVIDIA Deep Stream or Metropolis. And if you need to manage all of those computer vision networks, we have NVIDIA Fleet Command. And you can hear all about that at GTC as to how all of these pieces fit together and GTC is coming up in March of this year. With that, thank you so much for joining and here are some resources that you have access to. We have documentation, we have tutorials, blog posts, and NVIDIA on demand videos for you to go ahead and get started.