Groundbreaking Deep Learning Takes the Stage at GTC
Jamie Beckett, NVIDIA
March 28, 2018

Groundbreaking Deep Learning Takes the Stage at GTC

GPU TECHNOLOGY CONFERENCE – NVIDIA researchers aren’t magicians, but you might think so after seeing the work featured in Tuesday’s keynote address from NVIDIA founder and CEO Jensen Huang at the GPU Technology Conference, in San Jose.
Huang spotlighted two deep learning discoveries that hold the potential to upend traditional computer graphics. Both could help game developers create richer experiences in less time and at lower cost. And one could accelerate autonomous vehicle development by easily creating data to train cars for a wider variety of road conditions, landscapes and locations.

The pair of research projects are the latest examples of how NVIDIA is combining its expertise in deep learning with its long history in computer graphics to advance industries. The company’s 200-person strong NVIDIA Research team — spread across 11 worldwide locations — is focused on pushing the boundaries of technology in machine learning, computer vision, self-driving cars, robotics, graphics, computer architecture, programming systems, and other areas.


The two images here are clean versions of the same noisy picture. De-noising the image on the left was done by training a neural network on corresponding clean and noisy images. Researchers de-noised the picture on the right using a model trained soley on noisy images.

Cleaning up Noisy Images
You may not know what a noisy image is, but you’ve probably taken one. You aim the camera at a dimly lit scene, and your picture turns out grainy, mottled with odd splotches of color, or white spots known as fireflies.

Removing noise from images is difficult because the process itself can add artifacts or blurriness. Deep learning experiments have offered solutions, but have a major shortcoming: They require matched pairs of clean and noisy images to train the neural network.

That works as long as you have good pictures, but it can be hard, or even impossible, to get them. NVIDIA researchers in Finland and Sweden have created a solution they call Noise2Noise to get around this issue.


Ordinary AI denoising requires matched pairs of clean and dirty images. But it’s often impossible to get clean images for MRIs and some other medical images. With Noise2Noise, no clean images are necessary.

Garbage in, Garbage out? Not Anymore
Producing clean images is a common problem for medical imaging tests like MRIs and for astronomical photos of distant stars or planets — situations in which there’s too little time and light to capture a clean image.

Time also poses a problem in computer graphics. Just the task of generating clean image data to train a denoiser can take days or weeks.

Noise2Noise seems impossible when you first hear about it. Instead of training the network on matched pairs of clean and noisy images, it trains the network on matched pairs of noisy images — and only noisy images. Yet Noise2Noise produces results equal to or nearly equal to what a network trained the old-fashioned way can achieve.

“What we’ve discovered is by setting up a network correctly, you can ask it do something impossible,” said David Luebke, vice president of research. “It’s a really surprising result until you understand the whole thing.”

Not Child’s Play
The second project Huang featured represents a whole new way of building virtual worlds. It uses deep learning to take much of the effort out of cumbersome and costly tasks of 3D modeling for games and capturing training data for self-driving cars.

The technique, called semantic manipulation, can be compared to Lego bricks, which kids can put together to build anything from jet planes to dragons.

In semantic manipulation, users start with a label map — what amounts to a blueprint with labels for each pixel in a scene. Switching out the labels on the map changes the image. It’s also possible to edit the style of objects, like choosing a different kind of car, tree or road. 


The NVIDIA researchers’ deep learning-powered image synthesis technique makes it possible to change the look of a street simply by changing the semantic label.

Tough Game
The research team’s method relies on generative adversarial networks (GANs), a deep learning technique often used to create training data when it’s scarce.

Although GANs typically struggle to generate photorealistic, high-resolution images, NVIDIA researchers were able to alter GAN architecture in a way that made it possible.

Today, creating virtual environments for computer games requires thousands of hours of artists’ time to create and change models and can cost as much as $100 million per game. Rendering turns those models into the games you see on the screen.

Reducing the amount of labor involved would let game artists and studios create more complex games with more characters and story lines.

San Francisco to Barcelona: No Flight Required  
Obtaining data to train self-driving cars is equally cumbersome. It’s typically done by putting a fleet of cars equipped with sensors and cameras on the road. The data captured by the cars must then be labeled manually, and that’s used to train autonomous vehicles.

The team’s method could make it possible to take data from, say, San Francisco and apply it to another hilly city like Barcelona. Or turn a cobblestone street into a paved one, or convert a tree-lined street into one lined with parked cars.

That could make it possible to more effectively to train cars to handle many different situations. And could lead to a graphics rendering engine that’s been trained on real-world data and rendered with generative models.