It is known as the "uncanny valley," a term describing that sense of unease in response to highly realistic humanoid robots and lifelike computer-generated human characters that imperfectly resemble a human being.
For quite some time now, those working in computer graphics have made strides toward crossing the uncanny valley. One of those initial big steps was the long-awaited Final Fantasy: The Spirits Within (July 2001), the first photoreal computer-animated feature film from Square Pictures. Using then cutting-edge technology, the project received mixed reviews, many appreciating the technical achievement, others calling the characters "creepy." Either way, the film greatly exceeded its budget and contributed to the end of Square Pictures.
There were other noteworthy examples that followed, before the next big milestone moment for digital humans on-screen: The Curious Case of Benjamin Button (December 2008), about a character who ages backward. In his young-appearing version, the character is played by actor Brad Pitt; in his older appearance, he is a CG human created by Digital Domain.
During the past two years, more significant progress has been made in terms of creating realistic digital humans for films and television. In 2019, a digital fountain of youth gave several leading actors the opportunity to play characters at extremely younger ages throughout The Irishman, Gemini Man, and
Captain Marvel, thanks to leading-edge work by Industrial Light & Magic, Weta Digital, and Lola. Also that year, ILM created a young Sara Connor, John Connor, and T-800 for an opening sequence in
Terminator: Dark Fate. And, Gradient Effects created a younger version of the character played by John Goodman in flashback sequences for the episodic television series
In the past several months, Digital Domain has used a hybrid age-blending technique to add years to soccer phenom David Beckham for a malaria-awareness short film. And, the studio brought famed coach Vince Lombardi back from the dead for a Super Bowl presentation. A year prior, Digital Domain re-created a lifelike Martin Luther King Jr. as he recited the "I Have a Dream" speech for a VR experience called The March.
Weta Digital additionally generated a cyber lead for the film Alita: Battle Angel, while Disney Research has been actively pushing the state-of-the-art in creating and animating high-quality digital humans for visual effects over the past decade . And, Epic Games, Tencent, 3Lateral, Vicon, and Cubic Motion developed Siren, a high-fidelity digital human driven in real time by a live actress. Those are but a few of the impressive examples we have seen fairly recently.
Studios behind work such as this utilize their own type of secret sauce. Nevertheless, creating a realistic digital human generally involves two phases: capturing an actor's performance and applying it to the character, and creating the actual character. The steps, techniques, and technologies are complex and continually evolving. And when all is said and done, the character is rendered and inserted into the film, game, or other intended application.
Truly amazing stuff, really. But that is the tip of the proverbial iceberg. The jaw-dropping moments that today have us pondering whether we have crossed the uncanny valley bridge is the work being done with autonomous real-time digital humans.
The Digital Humans Group
Digital Domain remains at the forefront of creating realistic digital humans. In fact, the studio has a long history of building digital humans, crafting realistic creatures, and capturing actors, using various technologies to accomplish those goals. And for a long time, most of the tools and technologies were developed to serve the needs of a particular production - not always the best process given the time and work required.
In early 2017, the studio began looking at some new types of facial and performance capture for better translation onto a realistic digital human or creature. It was around this time that Marvel approached the studio, looking for new ways to better portray an actor's performance as a digital being. The studio showed them work it had been doing on new methods of facial capture, which led to being awarded the majority of work pertaining to Thanos and several other characters for Avengers: Infinity War (2018) and
While the studio was getting those shows into production, some new technologies were surfacing - specifically, machine learning and deep learning - while real-time rendering techniques were becoming more robust. "Doug and I started seeing how important these developments were, not just to feature films, but also to new kinds of applications, such as home assistants," says Darren Hendler, a 20-year veteran at Digital Domain. The Doug he is referring to is Doug Roble, senior director of software R&D, another 20-year veteran at the studio.
As a result, Digital Domain formed the Digital Humans Group (DHG) near the start of 2018 with the goal of overseeing all the studio's technology in the realm of digital humans, digital creatures, and capturing actors for digital effects, commercials, games, and "new, crazy things we couldn't even imagine at that stage," says Hendler, DHG director. The group was spurred on by research papers and demos over the past few years at SIGGRAPH, and by machine learning, which Roble calls "a game changer" in the way a person can look at problems and approach them.
The DHG team comprises 15 to 20 people. And even though there is a software group working on a wide gamut of technologies, it does not treat every issue as a software problem. "We look at it as software and artistry together, and there's a constant back and forth between them," says Hendler.
Creating Digital Humans
At Digital Domain, the process of creating a digital human (any digital human) for the longest time had been very much the same: Build a 3D version of a person and create textures, look dev, and motion, and then create a camera, film an actor from different angles, and see how the person looks - all in the 3D realm. This is the process the studio used for Benjamin Button and Thanos.
Now, however, technologies around that entire realm have evolved and spawned a new realm: the image-based, or the neural rendering, realm. "It didn't exist a year ago, and it's changing everything," Hendler notes. This area is tied to concepts like deepfakes. So, instead of building the character pixel-by-pixel, images or input are used to generate new images of that person or creature.
The DHG does not operate in this realm per se. Rather, the group uses a hybrid approach - one that combines the traditional 3D realm and the image-based realm by leveraging ray tracing from the latest Unreal Engine from Epic Games and Digital Domain's own sophisticated neural renderer.
Most of DHG's work current involves building a full three-dimensional version of a character - often a digital double of an actor - for film or television. This entails acquiring high-res scans down to the pore level. "These days it's a pretty easy process to generate a fully-rendered still of the person from any camera angle, and not be able to distinguish the real actor from the still," says Hendler. "The tricky part happens when that person moves, talks, and emotes. There's so much subtlety involved - for instance, the way the blood flow changes in the lips as you're talking, which can make it feel real or not."
The older Benjamin Button character was a CG human crafted by Digital Domain.
ILM de-aged Joe Pesci and others to play their younger selves for The Irishman.
Digital Domain used a hybrid blending technique to age soccer star David Beckham.
This is where DHG's hybrid approach comes into play in many of its productions, as the group uses a 3D version of a character and then on top of it, they employ a new type of neural network approach whereby the machine learning model tweaks the CG model of the character to be more realistic.
In fact, the DHG is employing this hybrid approach to the ongoing development of Douglas, a realistic real-time autonomous digital human - the next step in the evolution of digital humans (see "Douglas"). Creating a really high resolution, realistic, pre-rendered character that artists can control is vastly different to having a digital character infused with complex AI that can interact with people in real time. And while portions of the creation process between the two types are very similar, the outcomes are very different.
Both processes start out with building the full-3D version of the character's face (with face shapes, pore structures, and textures). For its digital characters, the DHG uses a combination of custom software and commercial products, such as Autodesk's Maya and Foundry's Mari and Nuke, in addition to machine learning framework tools, such as PyTorch, an open-source machine learning library, for functions such as speech recognition.
"We're feeding in much of the same data, even the way the wardrobe is modeled; just the rendering engine is different," says Hendler, emphasizing that a large amount of artistry goes into the entire process.
Traditionally for a film asset, Digital Domain employs an offline renderer like Chaos' V-Ray; for an autonomous character rendered in real time, the group uses Unreal. While the real-time aspect is achieved through the Unreal Engine, the output of the autonomous character, unlike a character in a video game, is not through Unreal. Unreal is an intermediate step, says Roble. "Then we send that through our neural rendering, which redoes some of the imagery," he explains. So, while the digital character goes through an extra layer of reality, backgrounds, for instance, are purely output from Unreal.
According to Roble, as Douglas progresses, DHG is moving away from the 3D creation model and moving more toward using neural rendering for that aspect.
The Digital Human League
Another group that delves deep into the creation of realistic digital humans is Chaos Labs, an applied science start-up founded in August 2014 that acts as a bridge of sorts between artists and developers by looking at where the industry is headed and what innovative technology artists will need to get there. This encompasses many advanced developments, including digital humans.
Chaos Labs has been doing a lot of work in this particular area, specifically related to shaders (Chaos is the developer of V-Ray, one of the industry's top shading/rendering software). "And the best way we could go about exploring [the creation of realistic digital humans] was by creating or working with a really, really good dataset. That is what inspired the Wikihuman Project," says Chris Nichols, director of Chaos Labs.
The Wikihuman Project was established and dedicated to the study, understanding, and sharing of knowledge pertaining to digital humans. Run by the Digital Human League (DHL), a panel of academic and industry experts, the site was open to anyone interested in crossing the uncanny valley.
"People have been battling the uncanny valley for far longer than computer graphics has been around. They've been doing this for hundreds and hundreds of years. And some would even say that the Mona Lisa is a perfect example of the uncanny valley because she doesn't quite look right," says Nichols.
As Nichols points out, the overall goal of DHL was to give people a good starting point for their work, rather than to advance specifics in terms of digital humans. "Back in the day, it always felt like we were starting from scratch with a dataset," he says. "We needed a baseline, and that is what we were trying to accomplish with the Wikihuman Project."
Both Wikihuman and DHL have since disbanded after achieving their goals, releasing Digital Emily 2, a free, high-res 3D scan of a female human head, establishing a baseline for other artists to use and learn from; it helped get Digital Mike Seymour off the ground.
Since this time, there have been large advancements of digital humans for film, thanks in large part to machine learning, which has enabled artists and researchers to tackle some very challenging problems, especially in terms of realism. Machine learning, says Nichols, approaches that problem analytically, not emotionally, "and it is making a huge difference."
"For something to look really humanlike, a good dataset is imperative, and machine learning is going to help significantly in that regard. The Wikihuman Project initiated that, and now, tools such as MetaHuman Creator are exploring it even more (see "MetaHuman Creator").
Nichols agrees with Hendler in that machine learning provides the important human subtleties that had been missing from CG humans and thus contributing to the uncanny look of the characters. "There's a lot of things that go on in our faces that we don't necessarily see, but machine learning can 'see' and 'learn' those things, and apply them to the model," says Nichols. "We tend to only see the bigger motions, not the subtle lip or eyelid movement, for instance."
As a result, machine learning has accelerated progress in this area by leaps and bounds, eliminating the emotional factor of interpreting (or rather, misinterpreting) the face, as well as greatly speeding all the trial-and-error during the creation process. Before, a lot of technology was needed to capture a human face - a lot of dots on the face - and lots of information. Today, much can be done with a mobile phone. "Even capturing technology is changing," Nichols points out. Whereas once a full motion-capture stage was needed, now a lot can be done with an inexpensive Rokoko mocap system, he adds.
While there are some impressive developments and applications for crafting realistic real-time autonomous digital humans, Nichols notes that a crucial element is still missing: full ray tracing. "We've done this a million times in the visual effects world. You can't really get a human to look completely real without ray tracing. And that's the thing that still hasn't gotten to the real-time digital human world just yet," says Nichols. "It's really hard to do substantive scattering, hardcore shading, right now in real time," says Nichols. "It's going to get there, it always does, but it's a real challenge. And until we get that, we're not going to be able to get a fully realistic human out there. The only other way to accomplish it is to completely fake it with deepfakes, but then it becomes a completely different way of looking at the rendering. You're not actually looking at digital humans, you're looking at a warped face of some kind."
Giantstep GX Lab
Like Digital Domain and the Digital Human League, the Korea-based creative studio Giantstep is also focused on realistic real-time digital humans. It started R&D in 2018, and "now, we're in the era of the Metaverse, where the development of the virtual human (metahuman) is very popular, and there are many companies challenging the technology," says Sungkoo Kang, director at GX Lab, the internal R&D division of Giantstep, who was the lead on an application called Project Vincent (see "Project Vincent"). "However, back when we started, even the term 'Metaverse' didn't exist, so for a small team like ours from a midsize company, it was not an easy decision to put a lot of money into developing such technology."
It is the studio's belief that realistic virtual humans will have enormous potential and practical value in the near future, followed in the far distant future by robot development. To this end, the studio started Project Vincent so it could see the possibilities and develop the underlying skills needed for this tech.
In the case of offline rendering, such as movies, artists and technologists can throw a lot of manual labor and computing power at the problem to adjust final outcomes. However, when it comes to a real-time digital human, the difficulty factor rises to a whole new level. As Kang points out, every calculation has to be done in 1/30 of a second and produce a finished image.
"Usually the shape of the performing actor and the 3D model's face are different, so there is a retargeting process whereby the actor's facial movement is read, converted, and then applied to the shape of the 3D model. Even this process takes more than 1/30 of a second," says Kang. "If you extremely optimize the data, this could be possible, but we chose a better solution."
As Kang stresses, AI is the most important part in the development process. While AI was an essential component in the creation of Vincent, it is not without limitations, particularly when it came to solving all of the retargeting. As a result, Giantstep developed a tool using a lightweight AI that solved the retargeting process discussed above in 1/30 of a second.
However, technology alone is not the answer. Kang agrees with Nichols, Hendler, and Roble in that the solution is also dependent on having great artists and technicians. When Giantstep first started focusing on digital humans, it set a goal to reduce its reliance on highly skilled artists and technicians. So, instead of modeling by hand, the studio opted for scanning and creating blendshapes by automated algorithms, rather than handmade production. "But, ironically, what we realized later was that had it not been for great artists and technicians, the look of Vincent would not have been possible, even with the best scanning service. The same was true for technical development, and we wouldn't have been able to choose the direction that we've gone."
Another company making very significant advancements in the area of autonomous digital humans is Soul Machines, with headquarters in San Francisco and R&D in Auckland, New Zealand. In fact, its co-founder, Mark Sagar, spent a number of years building realistic CG human characters for various blockbuster films while at Weta Digital and Sony Pictures Imageworks. Later, he started the Laboratory for Animate Technologies at the Auckland Bioengineering Institute, where he focused on bringing digital human technology to life, pulling together mathematical physics simulations and combining them with computer graphics.
With a PhD in bioengineering, Sagar had studied how faces are animated, how they deform, how they reflect light, and so forth. Indeed, he spent time during his VFX career building systems that simulated actors' emotions and behaviors. But, it all centered on the surface structure. He wanted to take things further, having an interest in biomechanics, physiology, how the brain worked, and, ultimately, consciousness.
"I had always looked at those areas and the progress being made in artificial intelligence," Sagar says. "I was interested in what actually drives actors' performances and motivates their behaviors."
At the lab, Sagar began pulling those aspects together in an attempt to create truly autonomous, fully-animated characters that would have a proper brain model driving the behaviors of the character. "It meant building a character in the complete opposite way that it had been done before. It meant, basically, building a character from the inside out," he says. "I wanted to make digital characters that would see and hear the world, and be able to take that all in and learn, and to have a memory and react to the world."
According to Sagar, the work in the lab encompassed two aspects. One involved building the brain models to drive a fully autonomous character, which was called BabyX (see "BabyX," page 12). The other was using parts of that technology to generate facial simulations.
After four and a half years of working toward this goal, the lab expanded into the company Soul Machines. On the research side, the company continued the extraordinary work on BabyX, exploring a new way to animate characters from the inside out. On the business side, the company began looking at using these lifelike digital humans in various applications.
The culmination of this work is Digital People for real-world applications. The company's Digital People contain the company's patented Digital Brain that contextualizes and adapts in real time to situations similar to human interactions for enhancing customer brand experiences. Currently, Soul Machines offers three categories of products. The Human OS Platform features a Digital Brain (for AI learning resulting in appropriate responses) and Autonomous Animation for lifelike interactions. The Digital DNA Studio Blender, meanwhile, enables customers to scale their unique Digital Person, creating multiple looks and leveraging up to 12 languages. Meanwhile, with Soul Solutions, customers can choose among three service levels, from entry level to advanced, to fit their needs.
The technology used to drive Soul Machines' adult digital humans, or virtual assistants, combines elements of the brain model with other technologies, such as speech recognition and NLP (natural language processing) algorithms - technologies that companies such as Microsoft and Google are working on - and enable the creation of conversational models, albeit ones that are more curated, as opposed to resulting from pure intelligence. Sagar compares the two methods to an actor saying lines: the person can either recite them from a detailed script or can improvise on the fly. Soul Machines is working on both ends of that scale, depending on the application it will be used for.
In fact, the use cases for Digital People are vast, spanning many sectors including consumer goods, health care, retail, tech, call centers, and more.
Soul Machines is continuing development of its technology and will be releasing its Digital DNA Blender for creating different faces in the near future. It also will be releasing technology that makes the digital humans content-aware, for meaningful interactions - generated autonomously and live.
RT Digital Humans at Work
Roble, Hendler, Nichols, Kang, and Sagar all agree that the future for digital humans looks promising. Presently, they are being used in visual media, but as real-time technology develops further, such as text-to-voice and voice-to-face technologies, it will become highly customizable and its role will be further involved in our daily lives, predicts Kang. "Especially now, with the emergence of the Metaverse, the use of avatars will come our way very naturally. In the same way that heavy labor has been replaced by robots, it is expected that much of our emotional labor will be replaced in the future as well."
Indeed, future applications of digital humans is a wide-open topic. At Giantstep, they're most interested in the Metaverse, inspired by sites like Zepeto, where social networks and virtual communities come together with custom avatars. However, the process remains difficult, and it still takes considerable time and effort to generate a high-quality digital human. But once the production is automated, Kang believes it will open the door to this kind of customization and diversity.
Nichols is unsure whether people will want avatars that look realistic or not. But, realistic digital humans have big potential as research tools, to understand real humans better. For instance, PTSD research has found that avatars can break down barriers for better communication; the same holds true for those with autism.
"For consumer experiences or entertainment, applications will be limited only to the creative minds working on behalf of the world's biggest brands to create unique consumer experiences," says Kang. "We could have digital human influencers, actors… whatever can be conceived."
Giantstep also is exploring development of a platform for learning AI networks that can help project management, such as distributed processing for learning, progress control, and confirmation.
Since releasing Vincent, the studio has received positive response from the industry and has received quite a few inquiries from brands, advertisers, and entertainment content creators interested in exploring opportunities. One of them is SM Entertainment, the largest entertainment company in Korea, which represents some of the best-known K-pop groups and artists, including aespa. "With aespa, we've helped create virtual avatars for each member of this new female K-pop group. SM Entertainment is redefining entertainment as we know it by extending their audience reach through the convergence of the real and virtual worlds," says Kang.
Moreover, since Vincent, Giantstep has streamlined its production pipeline and increased efficiency in creating digital humans. It has hired more artists and technicians who can improve the quality of its work and further expand its capabilities. "Without Vincent, the ease with which we were able to make the concept around aespa work wouldn't have been possible," Kang adds.
For Good or Otherwise
The technologies discussed here can invoke fear and concern among the public. "But, ultimately, we're all, as humans, adjusting to what's new and possible, and reconfiguring certain things about what we believe and what we understand. And I think there's a lot of good and benefits to be seen from these technologies," says Hendler. "We often don't talk enough about how it can be helpful and beneficial."
For instance, the subject of deepfakes has become a fascinating, if not controversial, one. Technically, some people are defining deepfakes as taking someone else's likeness and superimposing that on to someone else, and not necessarily with their permission, says Nichols. And that is being done through machine learning. "Technologically, it's a very straightforward process," he adds. "Many people question whether this technology could ever be used for anything good."
The answer is yes, it can be. At the spring 2021 RealTime Conference (RTC) focused on the Metaverse, presenters in a session on digital humans and virtual agents discussed the use of technology similar to that of deepfakes in a documentary called Welcome to Chechnya. In the film, activists and victims relay horrific stories of persecution by their government and families while fighting for LGBTQ+ rights there, and it was imperative that their identities be masked due to potential repercussions of those interviewed. So, rather than blurring or pixelating their faces, the filmmakers used AI and machine learning to disguise them using face doubles - keeping the individuals safe, while retaining the all-important human angle of the documentary.
"This proved to me that just because you don't understand a technology or you see a bad example of how it's being used, that doesn't mean it's a bad thing," says Nichols.
As for the future of Douglas, "while it sounds like we're just madly experimenting and trying to trick people by creating a realistic person who you can't tell is not real on a Zoom call, that's not our aim here," Hendler points out. "Our aim is to create a human that you can have an emotion connection with, that you can interact with, that you know is not real, and that person is able to do useful things for you."
Hendler predicts the DHG is within a year from being in that position - a person, character, or creature that you can have a connection with that understands you and is a very useful entity to have around, and does a variety of useful things. In this regard, some possible uses for technology like Douglas is as a virtual concierge at a hotel, museum, or airport to provide information in a more interactive way, or even as a companion.
In fact, Hendler sees the research into digital humans as having an effect on computer games, which he predicts will feature exceedingly realistic human characters in terms of looks, speech, and interaction. He also foresees more applications in films, particularly in terms of aging and de-aging actors, resulting in actors playing themselves in a range of roles through time. He does not see this technology replacing actors: While it can make them older or younger, it is not creating a performance.
"We don't even know what will end up being key products. We just know [the capability is] coming and people are interested. It's evolving, and we are along for the ride," says Hendler.
Is the Future 'Uncanny'?
There is a wide measure among the experts interviewed here as to whether or not we have crossed the uncanny valley. Nichols believes we crossed it a while ago - "taking a lot of work to do it and a lot of people being very smart about how they approached it." He points to Digital Domain's work on Benjamin Button, which was accomplished prior to machine learning, instead relying on a lot of human time and effort, particularly in the trial-and-error stages.
"Most people didn't realize it was a digital human for the first third of the movie," Nichols says of the main character developed at his former employer, Digital Domain. "That is a sign, at least in my mind, that it was well done. So, I think the uncanny valley has been crossed a few times. If you want to completely fool someone, then yes, it takes a lot of work to get there, but it's possible to do."
Interestingly, Nichols finds that nowadays people generally seem to tolerate digital characters that are not very humanlike at all, accepting avatars that fall within the uncanny valley, especially in games. "People don't seem to have a problem with them like we once did," he says. And while games continue to push in the direction of realistic digital humans, the genre has yet to cross the uncanny valley.
Hendler, on the other hand, is more conservative in his assessment and doesn't believe that applications like Douglas have actually crossed this chasm. Yet. Rather, he sees them as "managing" the uncanny valley. "I would say we've actually stepped further into the uncanny valley than we have before, but I don't think it's going to be solved anytime in the next three to four years," he adds.
Sagar, however, has a different opinion, believing there are multiple uncanny valleys to cross. Indeed, some entities have crossed the first, achieving realistic-looking digital humans, with the next valley encompassing appropriate behavior. "There are so many combinations there that you have to get right," he says. "So, the model may look real and speak in a realistic way, but, is it talking about the same thing as you are, or is it completely random? If so, it can be quite disturbing and frustrating because, effectively, you've got a digital human that's not following what you are saying or is not acting appropriately."
While these experts may have differing views on the current state of digital humans, they all agree that in the not-so-distant future they will be part of our lives.
Karen Moltenbrey is the chief editor of CGW.