In 1985, Lucasfilm created the first photorealistic CG character - the stained-glass knight - that appeared in a feature film, Young Sherlock Holmes. Even though the character had just 10 seconds of screen time, the unprecedented appearance marked the beginning of the industry's quest to cross the Uncanny Valley. Since then, there have been a number of incremental milestones in terms of realistic CG humans, including the digital background actors strolling on the deck in the movie
Titanic, to the all-digital cast in the animated feature
Final Fantasy: The Spirits Within.
Each year as technology becomes more developed and sophisticated, CG animators are getting that much closer to solving the problem whereby a digital model looks almost human, but not exactly, with a result that makes the viewer feel a bit uneasy.
For some time, Epic Games has been working with partners to advance the state of digital human realism, particularly as it pertains to the subtleties and intricacies of motion, not only resulting in realistic CG actors, but realistic CG actors that can perform in real time.
At last year's SIGGRAPH, attendees were able to "MeetMike," a real-time, interactive, VR tech demo whereby reporter Mike Seymour, outfitted with a facial-capture rig, interviewed various industry experts wearing VR headsets; his facial image was then transferred onto a lifelike avatar in real time. This, as the real Mike drove the actions of the digital Mike, which was rendered at 90 fps in VR via Epic's Unreal Engine. The facial images were tracked and solved with Cubic Motion's technology, while the data was relayed to the facial rig from 3Lateral.
Previous to this, Epic, Cubic Motion, and 3Lateral advanced real-time animation/performance-capture techniques for Ninja Theory's video game Hellblade: Senua's Sacrifice, with the goal of shooting and editing CG scenes in real time - a term Epic Games' CTO Kim Libreri calls "real-time cinematography" as opposed to previsualization, since all the aspects of the scene are represented virtually on set (lighting, facial and body imagery, voice, camera, and VFX). For the title character, Ninja Theory teamed with 3Lateral and Cubic Motion to create a virtual double of actress Melina Juergens. The CG photoreal Senua - one of gaming's most believable characters - was then controlled by live performance capture and real-time animation within the game environment.
More recently, Epic again partnered with Cubic Motion and 3Lateral to continue to advance realistic human performance in real time with the introduction of Siren, the latest lifelike digital actor and the star of Epic's technology demo of the same name, which was shown on stage during GDC 2018.
"When we began working on 'Siren,' we knew from the beginning that it was going to push several boundaries," says Libreri, who had extensive experience in visual effects for feature films, including Star Wars: The Force Awakens, The Matrix trilogy,
War of the World, and more, before he began focusing on real-time technology.
The same holds true for Andrew Harris, studio CG supervisor at Epic, enabling him to look at real time from a pre-rendered perspective, as well. "In the industry, there's been an interesting shift from pre-rendered entertainment to the real-time medium because of the increase in fidelity in motion capture. It's an extremely exciting time right now," he says. "I'm one of a growing group of migrants from the VFX industry who is coming into the game space and trying to discover what's possible with these tools."
Harris sees "Siren" as an advancement of the work done for Senua's Sacrifice, albeit an evolutionary one. "In some sense, a lot of what we're exploring in real time right now is already ground that's been covered in visual effects work that's pre-rendered, when you're afforded 10 hours a frame," he says. "What's really groundbreaking is that we're trying to employ the same techniques in real time, so our render times are 42 milliseconds per frame, rather than hours per frame."
Also, Senua was an artistically created character, though it was based on scanned data, whereas Siren is intended to look precisely like the actress on which the character is based. But with Siren and even Mike from "MeetMike," as well as some recent work with Andy Serkis, "we were as precise as possible with the capture data. Every detail of the asset needs to hold up in any lighting condition and from any angle," Harris says.
In High Fidelity
Epic began working on the "Siren" project a little over a year ago, which began as a collaboration between Epic and Tencent, with the goal of creating a proof-of-concept demonstration to show the capabilities of Epic's Unreal Engine 4 and what the next generation of digital characters would look like.
Once again, the group turned to 3Lateral, which digitally scanned Chinese actress Bingjie Jiang, whose likeness would be reflected in the CG Siren model. Jiang was photographed extensively, with 3Lateral scanning her entire body as well as her face while performing various FACS poses in
order to isolate each muscle movement. "It's not that we captured just a single pose; she does a range of motions, and we captured a 3D model in all of those positions," Harris explains.
When 3Lateral is building the face, the rig actually blends in a new shape for every expression. "When the lip curls, we have a lip curl model that we're blending in," he adds. "Much of the complexity of the rig is in terms of managing those shapes and how they blend from one to the other."
All told, 3Lateral provided 3D and 4D scans, along with materials "that were as precise and as close to the true-world values as possible," Harris notes. "We create digital models that are accurate down to the pore level. We also come away with enough photographic imagery to create all of the textures that go into making the character look realistic."
Alas, with high-fidelity data comes large file sizes, "larger than we had ever dealt with before in-engine," Harris notes. "To bring Siren into the engine, we were working with an FBX file that was about 4GB."
While 3Lateral worked on the facial model, Epic and Tencent refined the body scans, optimizing the topology of the dense scans.
Following the scanning process, Epic received a model that was an extremely accurate representation of Jiang's body, though it required Epic to do a little cleanup and make a few adjustments. For this work, the artists sculpted the model using Pixologic's ZBrush and re-topologized it in Autodesk's Maya, before the rigging team got to work in Maya.
"This way we have an exact copy of the entire rig in-engine so we could have a one-to-one match between the animations and deformations in Maya and the animations and deformations we'd get in the engine," Harris notes.
Re-creating the subtle intricacies of movement can be the difference between a realistic digital re-creation and one that leaves audiences feeling unnerved. So, it was vital that Jiang's movements aligned perfectly to the digital model. Jiang was outfitted with a motion-capture suit and head-mounted camera, while Epic and Tencent utilized Vicon's Vantage optical mocap system, its Shögun software, and VUE video cameras to capture precise and authentic movement and to add the character animations over the reference footage in real time.
"We tracked her movements, and then in real time we would drive the animation controls on the digital puppet, and our goal was a one-to-one match," says Harris.
Cubic Motion provided the technology
to capture and track the facial performance, and to then calculate, or solve, in real time what pose the face rig should be in. Cubic Motion then tracked the facial contours, aligning the curves to the eyes, nose, and mouth.
Meanwhile, 3Lateral provided the facial rig and underlying controls that would enable the digital model to achieve any pose that the actress herself could do. Using a proprietary machine-learning algorithm, Cubic Motion would drive the 3Lateral rig based on the contour tracking of the face and position the digital character in the pose to match the actress. "It took refinement. The face-tracking technology is sensitive to the lighting environment, and you have to train the solver to get an appealing result, and that takes some time," says Harris.
Crossing the Uncanny Valley has been difficult for artists, mainly because our minds are tuned to recognize another human and react to their expressions and micro-expressions. So, we tend to focus on little things, whether it's the tear ducts or the smallest of details, such as the hairs on the face. "We can improve each of those to the point where we feel like we're 100 percent there. But when you back up and look at the whole face, you notice there's still something that feels a little uncanny," Harris adds. "That's because there are still incremental improvements to be made in lots of little areas. You might be off by a fraction of a percent in some of those areas, but it can still make a big difference."
Consider, for instance, the mix of technology required to create "Siren." "It's exciting to see that it's possible to create a digital human in real time and track the actress's movement. But it's also eye-opening to see how much gear she has to wear. So, we have a ways to go, not in terms of just making the characters look more realistic, but also in making sure the technology can be less cumbersome for the performers," Harris says.
Nevertheless, with each new application, we are making progress. Sometimes that progress comes from large advances, other times from incremental changes in techniques and technologies such as rendering, depth of field, subsurface scattering, and more. "Siren" falls into that latter description. It is worth pointing out that for internal applications like "Siren," the Epic team does not use extremely specialized tools, but instead uses an internal version of its commercial engine, and often many of the features used in the projects find their way into the next Unreal Engine release.
But make no mistake: "Siren" pushes the envelope when it comes to rendering and live-performance capture, with Epic advancing real-time rendering to the point where the quality of the real-time rendering is on par with what had previously only been achieved through a software renderer.
Real-time in A Pre-rendered World
Hailing from the pre-rendered VFX world, Harris is excited over the benefits that real time presents. "Lights can be moved in real time. You're not clicking Render, going to get a coffee, and coming back 20 minutes later to see what the image looks like," he says. "It's a huge change in the way you can be productive as an artist. I think maybe people in the gaming industry have known this for a long time, and now that we're doing really high-fidelity work and in real time, even with raytracing in some cases, it's pretty exciting."
The most obvious advantage this brings is the ability to iterate quickly. Also, because the content isn't locked, it can be customized for viewers. "Imagine changing the content of a cartoon on the fly based on each viewer because it's being rendered in real time, it's not pre-rendered."
In the near term, Harris believes we will see more realistic avatars in the games we're playing and in the apps we are using. Longer term, he believes the intersection of AI and "really, really convincing digital humans" is going to be a big paradigm shift. "Obviously we're not there yet, but I think these are the technologies that you're going to see converge - the combination of AR, convincing digital humans, and artificial intelligence," says Harris.
Indeed, the convergence of this tech, along with real-time advances, are "enabling us to do some of the things we have been doing in the pre-rendered world for a long time," notes Harris. However, the real-time aspect adds a whole other dimension.
"It will just keep going forward, so as this gets faster, we'll be able to do more raytracing. More strands of hair. We'll be able to have more blendshapes, to have more convincing expressions. We'll be able to have more believable synthesized performances," says Harris. "Each year, I think we're going to see really exciting leaps in all these areas, and we are moving forward at incredible speeds."
Leading us eventually to lifelike digital humans the likes of which are hard to imagine just now. "I think it will be inescapable, everywhere. But that's a little ways off," Harris predicts. But hard to resist, just like the call of the mythical Sirens of the sea, only this time leading the industry to endless productive possibilities.
Karen Moltenbrey is the chief editor of CGW.