Soul Searching
Issue: July-August-September 2021

Soul Searching

It is known as the "uncanny valley," a term describing that sense of unease in response to highly realistic humanoid robots and lifelike computer-generated human characters that imperfectly resemble a human being.

For quite some time now, those working in computer graphics have made strides toward crossing the uncanny valley. One of those initial big steps was the long-awaited Final Fantasy: The Spirits Within (July 2001), the first photoreal computer-animated feature film from Square Pictures. Using then cutting-edge technology, the project received mixed reviews, many appreciating the technical achievement, others calling the characters "creepy." Either way, the film greatly exceeded its budget and contributed to the end of Square Pictures.

There were other noteworthy examples that followed, before the next big milestone moment for digital humans on-screen: The Curious Case of Benjamin Button (December 2008), about a character who ages backward. In his young-appearing version, the character is played by actor Brad Pitt; in his older appearance, he is a CG human created by Digital Domain.

During the past two years, more significant progress has been made in terms of creating realistic digital humans for films and television. In 2019, a digital fountain of youth gave several leading actors the opportunity to play characters at extremely younger ages throughout The Irishman, Gemini Man, and Captain Marvel, thanks to leading-edge work by Industrial Light & Magic, Weta Digital, and Lola. Also that year, ILM created a young Sara Connor, John Connor, and T-800 for an opening sequence in Terminator: Dark Fate. And, Gradient Effects created a younger version of the character played by John Goodman in flashback sequences for the episodic television series Righteous Gemstones.

In the past several months, Digital Domain has used a hybrid age-blending technique to add years to soccer phenom David Beckham for a malaria-awareness short film. And, the studio brought famed coach Vince Lombardi back from the dead for a Super Bowl presentation. A year prior, Digital Domain re-created a lifelike Martin Luther King Jr. as he recited the "I Have a Dream" speech for a VR experience called The March.

Weta Digital additionally generated a cyber lead for the film Alita: Battle Angel, while Disney Research has been actively pushing the state-of-the-art in creating and animating high-quality digital humans for visual effects over the past decade . And, Epic Games, Tencent, 3Lateral, Vicon, and Cubic Motion developed Siren, a high-fidelity digital human driven in real time by a live actress. Those are but a few of the impressive examples we have seen fairly recently.

Studios behind work such as this utilize their own type of secret sauce. Nevertheless, creating a realistic digital human generally involves two phases: capturing an actor's performance and applying it to the character, and creating the actual character. The steps, techniques, and technologies are complex and continually evolving. And when all is said and done, the character is rendered and inserted into the film, game, or other intended application.

Truly amazing stuff, really. But that is the tip of the proverbial iceberg. The jaw-dropping moments that today have us pondering whether we have crossed the uncanny valley bridge is the work being done with autonomous real-time digital humans.

The Digital Humans Group

Digital Domain remains at the forefront of creating realistic digital humans. In fact, the studio has a long history of building digital humans, crafting realistic creatures, and capturing actors, using various technologies to accomplish those goals. And for a long time, most of the tools and technologies were developed to serve the needs of a particular production - not always the best process given the time and work required.

In early 2017, the studio began looking at some new types of facial and performance capture for better translation onto a realistic digital human or creature. It was around this time that Marvel approached the studio, looking for new ways to better portray an actor's performance as a digital being. The studio showed them work it had been doing on new methods of facial capture, which led to being awarded the majority of work pertaining to Thanos and several other characters for Avengers: Infinity War (2018) and Endgame (2019).

While the studio was getting those shows into production, some new technologies were surfacing - specifically, machine learning and deep learning - while real-time rendering techniques were becoming more robust. "Doug and I started seeing how important these developments were, not just to feature films, but also to new kinds of applications, such as home assistants," says Darren Hendler, a 20-year veteran at Digital Domain. The Doug he is referring to is Doug Roble, senior director of software R&D, another 20-year veteran at the studio.

As a result, Digital Domain formed the Digital Humans Group (DHG) near the start of 2018 with the goal of overseeing all the studio's technology in the realm of digital humans, digital creatures, and capturing actors for digital effects, commercials, games, and "new, crazy things we couldn't even imagine at that stage," says Hendler, DHG director. The group was spurred on by research papers and demos over the past few years at SIGGRAPH, and by machine learning, which Roble calls "a game changer" in the way a person can look at problems and approach them.

The DHG team comprises 15 to 20 people. And even though there is a software group working on a wide gamut of technologies, it does not treat every issue as a software problem. "We look at it as software and artistry together, and there's a constant back and forth between them," says Hendler.

Creating Digital Humans

At Digital Domain, the process of creating a digital human (any digital human) for the longest time had been very much the same: Build a 3D version of a person and create textures, look dev, and motion, and then create a camera, film an actor from different angles, and see how the person looks - all in the 3D realm. This is the process the studio used for Benjamin Button and Thanos.

Now, however, technologies around that entire realm have evolved and spawned a new realm: the image-based, or the neural rendering, realm. "It didn't exist a year ago, and it's changing everything," Hendler notes. This area is tied to concepts like deepfakes. So, instead of building the character pixel-by-pixel, images or input are used to generate new images of that person or creature.

The DHG does not operate in this realm per se. Rather, the group uses a hybrid approach - one that combines the traditional 3D realm and the image-based realm by leveraging ray tracing from the latest Unreal Engine from Epic Games and Digital Domain's own sophisticated neural renderer.

Most of DHG's work current involves building a full three-dimensional version of a character - often a digital double of an actor - for film or television. This entails acquiring high-res scans down to the pore level. "These days it's a pretty easy process to generate a fully-rendered still of the person from any camera angle, and not be able to distinguish the real actor from the still," says Hendler. "The tricky part happens when that person moves, talks, and emotes. There's so much subtlety involved - for instance, the way the blood flow changes in the lips as you're talking, which can make it feel real or not."

Benjamin Button
The older Benjamin Button character was a CG human crafted by Digital Domain.

The Irishman
ILM de-aged Joe Pesci and others to play their younger selves for The Irishman.

David Beckham
Digital Domain used a hybrid blending technique to age soccer star David Beckham.

This is where DHG's hybrid approach comes into play in many of its productions, as the group uses a 3D version of a character and then on top of it, they employ a new type of neural network approach whereby the machine learning model tweaks the CG model of the character to be more realistic.

In fact, the DHG is employing this hybrid approach to the ongoing development of Douglas, a realistic real-time autonomous digital human - the next step in the evolution of digital humans (see "Douglas"). Creating a really high resolution, realistic, pre-rendered character that artists can control is vastly different to having a digital character infused with complex AI that can interact with people in real time. And while portions of the creation process between the two types are very similar, the outcomes are very different.

Both processes start out with building the full-3D version of the character's face (with face shapes, pore structures, and textures). For its digital characters, the DHG uses a combination of custom software and commercial products, such as Autodesk's Maya and Foundry's Mari and Nuke, in addition to machine learning framework tools, such as PyTorch, an open-source machine learning library, for functions such as speech recognition.

"We're feeding in much of the same data, even the way the wardrobe is modeled; just the rendering engine is different," says Hendler, emphasizing that a large amount of artistry goes into the entire process.

Traditionally for a film asset, Digital Domain employs an offline renderer like Chaos' V-Ray; for an autonomous character rendered in real time, the group uses Unreal. While the real-time aspect is achieved through the Unreal Engine, the output of the autonomous character, unlike a character in a video game, is not through Unreal. Unreal is an intermediate step, says Roble. "Then we send that through our neural rendering, which redoes some of the imagery," he explains. So, while the digital character goes through an extra layer of reality, backgrounds, for instance, are purely output from Unreal.

According to Roble, as Douglas progresses, DHG is moving away from the 3D creation model and moving more toward using neural rendering for that aspect.

Douglas

Douglas is prime example of the current state-of-the-art in the creation of digital humans. Douglas is a realistic autonomous digital human whose sophisticated AI enables him to not only interact with humans, but react in real time, and is capable of learning and recalling information insofar as he can carry on a focused conversation by paying attention to what is being said and formulating an appropriate response.

Douglas
DHG’s Douglas is a realistic autonomous human based on Doug Roble.

Douglas is based on an actual human, Doug Roble, and is the “Adam” of Digital Domain’s Digital Human Group (DHG). The group wanted to create the digital human based on a real person, to see how closely they could replicate the person; they also needed easy access to the subject. Roble, senior director of software R&D there, fit the bill.

Digital Douglas is being developed as a personal assistant, and appears on-screen during video chats.

“We want Douglas to be as fast [with his responses] as a real person,” says Roble. “And that has been one of the big challenges right off the bat. It’s all about speed. He has to move quickly, talk quickly, and he has to respond quickly so that his feels like a real conversation.” Machine learning is teaching Douglas that a pause in speech often means it’s his turn to talk. But, sometimes the other person is just catching his or her breath, so more fine-tuning is needed in this area.

“Overall, he’s quick, but not as quick as we want him to be. We’re always working on making him faster and faster,” says Roble.

While you are speaking to Douglas, he is examining your face to see if he recognizes you (through facial recognition), and if so, accesses any memories he has associated with you — prior to any memory reset. (DHG is well aware of potential privacy issues that could arise should they ever release an application to the public.) Douglas also sends your speech to a speech recognizer, which then forwards the text of that conversation to three different natural language processing (NLP) conversational agents that come up with different responses — those that are purely conversational, some that are more scripted, and others that are more factual-based (by searching the Internet). And then he decides which of those three response types is more appropriate.

Semantic and emotional analyses of the text determine the best emotional range for Douglas’s response (happy, sad) and then generate his voice, while creating an appropriate facial performance that coincides with that determination. An emotional range for the body is also calculated.

“All that is pulled together and rendered out,” says Darren Hendler, DHG director. In addition, real-time cloth dynamics are calculated. “Now I run that through a separate process that does the neural rendering on top to make Douglas seem more realistic than he does straight out Unreal, and I deliver that into a Zoom call, for instance. And that is happening in every frame in real time. It’s fast and amazing, but not fast enough to be a real person.”

Hendler continues: “We come from a background where the single frame of an image could take five hours to render for a feature film. So, this is mind-blowing for us to be able to do all of that as fast as we do.”

And the power behind this feat? “Nothing you can’t buy from a store,” says Hendler. Douglas does not need a supercomputer; typically, the application runs on two separate computers — one devoted to rendering and another devoted to speech, AI, and so forth — since many of the functions need to run in parallel. Although, rather than two computers, it can be run on a single machine with two GPUs. And because it runs in parallel, some functions can be turned off if desired.

Douglas Douglas

BUILDING A BETTER DOUGLAS

Which aspects of Douglas still need improvement? “Just about everything” responds Roble. Well, everything that stops a person from thinking Douglas is a real human.

“We don’t think Douglas looks photoreal. Rather, we are saying, ‘look at what’s possible, and look at the trajectory of where this is going and where this will be,’” Hendler says. “But look at what’s going on in a few milliseconds; it’s quite amazing.”

Amazing, indeed. During a Zoom call appearance, for instance, when he is quiet and moving ever so slightly, many (including this editor) do a double take, wondering whether or not Douglas is real.

To truly simulate a real person, says Roble, will require improvement in the current technology. While machine learning can mimic human behavior, to have an application that really understands what is actually occurring in a conversation falls in the AI realm. “And that full-blown artificial intelligence is still far off,” he adds, noting that the amazing examples at this time are just mimicking what they’ve observed. “To actually understand what’s happening is really, really, really hard and doesn’t exist yet.”

Nevertheless, there is so much more that can be done at this time — the speech synthesis, the expressions, the movement. “And every month Douglas gets better and better and can do more things,” Roble adds.

The Digital Human League

Another group that delves deep into the creation of realistic digital humans is Chaos Labs, an applied science start-up founded in August 2014 that acts as a bridge of sorts between artists and developers by looking at where the industry is headed and what innovative technology artists will need to get there. This encompasses many advanced developments, including digital humans.

Chaos Labs has been doing a lot of work in this particular area, specifically related to shaders (Chaos is the developer of V-Ray, one of the industry's top shading/rendering software). "And the best way we could go about exploring [the creation of realistic digital humans] was by creating or working with a really, really good dataset. That is what inspired the Wikihuman Project," says Chris Nichols, director of Chaos Labs.

Digital Humans Digital Humans

The Wikihuman Project was established and dedicated to the study, understanding, and sharing of knowledge pertaining to digital humans. Run by the Digital Human League (DHL), a panel of academic and industry experts, the site was open to anyone interested in crossing the uncanny valley.

"People have been battling the uncanny valley for far longer than computer graphics has been around. They've been doing this for hundreds and hundreds of years. And some would even say that the Mona Lisa is a perfect example of the uncanny valley because she doesn't quite look right," says Nichols.

As Nichols points out, the overall goal of DHL was to give people a good starting point for their work, rather than to advance specifics in terms of digital humans. "Back in the day, it always felt like we were starting from scratch with a dataset," he says. "We needed a baseline, and that is what we were trying to accomplish with the Wikihuman Project."

Digital Humans Digital Humans

Both Wikihuman and DHL have since disbanded after achieving their goals, releasing Digital Emily 2, a free, high-res 3D scan of a female human head, establishing a baseline for other artists to use and learn from; it helped get Digital Mike Seymour off the ground.

Since this time, there have been large advancements of digital humans for film, thanks in large part to machine learning, which has enabled artists and researchers to tackle some very challenging problems, especially in terms of realism. Machine learning, says Nichols, approaches that problem analytically, not emotionally, "and it is making a huge difference."

"For something to look really humanlike, a good dataset is imperative, and machine learning is going to help significantly in that regard. The Wikihuman Project initiated that, and now, tools such as MetaHuman Creator are exploring it even more (see "MetaHuman Creator").

MetaHuman Creator

Metahuman Creator MetaHuman Creator
MetaHuman Creator has made it easier to generate realistic human models.

Creating realistic-looking digital humans for use in a real-time engine used to be a major feat. However, Epic Games is making it easier with its MetaHuman Creator tool, a new, free browser-based app for building fully-rigged digital humans, complete with hair and clothing, in less than an hour, for direct use in Epic’s Unreal Engine. In early access, MetaHuman Creator runs in the cloud via Unreal Engine Pixel Streaming.

“Up until now, one of the most arduous tasks in 3D content creation has been constructing truly convincing digital humans. Even the most experienced artists require significant amounts of time, effort, and equipment, just for one character,” says Vladimir Mastilovic, VP—Digital Humans Technology at Epic Games. “After decades of research and development, and thanks to bringing companies like 3Lateral, Cubic Motion, and Quixel into the Epic family, that barrier is being erased.”

MetaHuman Creator enables users to easily generate new characters through intuitive workflows that let them sculpt and craft results as desired. As adjustments are made, MetaHuman Creator blends between actual examples in the library in a plausible, data-constrained way. Users can choose a starting point by selecting a number of preset faces to contribute to their human from the range of samples available in the database.

Users are able to apply a variety of hair styles that use Unreal Engine’s strand-based hair system, or hair cards for lower-end platforms. There’s also a set of example clothing to choose from, as well 18 differently proportioned body types. When ready, users can download the asset via Quixel Bridge, fully rigged and ready for animation and motion capture in Unreal Engine, complete with LODs. Users will also get the source data in the form of an Autodesk Maya file, including meshes, skeleton, facial rig, animation controls, and materials.

These digital humans can run in real time on high-end PCs with an Nvidia RTX graphics card, even at their highest quality with strand-based hair and ray tracing enabled.

Technological Improvements

Nichols agrees with Hendler in that machine learning provides the important human subtleties that had been missing from CG humans and thus contributing to the uncanny look of the characters. "There's a lot of things that go on in our faces that we don't necessarily see, but machine learning can 'see' and 'learn' those things, and apply them to the model," says Nichols. "We tend to only see the bigger motions, not the subtle lip or eyelid movement, for instance."

As a result, machine learning has accelerated progress in this area by leaps and bounds, eliminating the emotional factor of interpreting (or rather, misinterpreting) the face, as well as greatly speeding all the trial-and-error during the creation process. Before, a lot of technology was needed to capture a human face - a lot of dots on the face - and lots of information. Today, much can be done with a mobile phone. "Even capturing technology is changing," Nichols points out. Whereas once a full motion-capture stage was needed, now a lot can be done with an inexpensive Rokoko mocap system, he adds.

While there are some impressive developments and applications for crafting realistic real-time autonomous digital humans, Nichols notes that a crucial element is still missing: full ray tracing. "We've done this a million times in the visual effects world. You can't really get a human to look completely real without ray tracing. And that's the thing that still hasn't gotten to the real-time digital human world just yet," says Nichols. "It's really hard to do substantive scattering, hardcore shading, right now in real time," says Nichols. "It's going to get there, it always does, but it's a real challenge. And until we get that, we're not going to be able to get a fully realistic human out there. The only other way to accomplish it is to completely fake it with deepfakes, but then it becomes a completely different way of looking at the rendering. You're not actually looking at digital humans, you're looking at a warped face of some kind."

Project Vincent

Douglas is not the only realistic real-time digital human out there. In fact, there are a number of them, including ongoing work by digital humans researcher Mike Seymour, whose project MeetMike — a real-time performance-captured VR version of himself — was shown at SIGGRAPH 2017. Another impressive application is Project Vincent from Giantstep’s GX Lab.

The aim of the project is to create a realistic digital human with a real-time engine. Vincent can move in real time and is capable of emotional expression, and Giantstep is developing a way for him to communicate with people by grafting AI. While Vincent is not fully autonomous at present time, Giantstep is continuing to develop his communication skills.

Initially, GX Lab was formed with just three people, but by the time the project was finished, that number had grown to 17. According to Sungkoo Kang, director at GX Lab, it took approximately one year to see the first visual results of their work, and since then, R&D has continued as they finesse the necessary technology.

To create Vincent, the group started with 300 to 2,000 facial blendshapes, a process that would have been too time-consuming with existing digital content creation tools, and nearly impossible in a real production. Instead, the group created an internal plug-in that divided the areas of the face and recorded and automated the division and editing of facial expressions, which hugely improved quality and shortened the production time.

“We also reviewed several solutions to simulate facial expressions in real time, but we didn’t find a real-time solution that could produce the level of quality we wanted,” says Kang. “So, we developed an artificial network that uses machine learning to implement facial expressions.”

Determining the shape of the facial expression is a very complex problem, Kang points out. For example, there are about 12 parameters related to the corner of the mouth. “You can see how complicated this problem is by looking at only three of the parameters, like the horizontal pull of the mouth, the height of the mouth, and the degree of the chin’s opening,” he explains. “The tail of the mouth should have a completely different shape depending on whether the mouth is pulled to the side with the chin open, or it is pulled to the side with the mouth closed.”

Project Vincent
Giantstep GX Lab’s CG Vincent.

That is a cubic equation, Kang notes, and the functions become more complicated when other factors are involved, such as surrounding elements like the nose, cheeks, and eyelids. “It’s almost impossible to approach this mathematically,” he says.

However, if there is enough training data for artificial intelligence using machine learning, this can be solved, and even faster computations in real-time simulation are possible, too. For this reason, Giantstep used machine learning to link the parameters necessary to determine facial expressions to the front image of the face, thus it became possible to implement complex expressions with just one streamed frontal image.

“At the time of Vincent’s development, only a small number of actors’ facial expressions were trained, so only a small number of people could move Vincent,” says Kang. Since then, the group has continued training their AI networks, and now, any face can be read and can move Vincent’s face. However, depending on the level of training, the AI network may cause an error between unwanted facial expressions. In order to completely exclude this error, the team also developed a technology that processes both eyes and the mouth as a completely separate network.

Vincent has since been used in a number of fields. At the opening of IBM’s Artificial Intelligence Conference, he spoke with the representative of the Korean branch of IBM and won the Epic Games MegaGrant. Internally, GX Lab has used Vincent as a means for many other emerging technologies, and, says Kang, he’s played a huge part in the continued development of digital human technology.

Giantstep GX Lab

Like Digital Domain and the Digital Human League, the Korea-based creative studio Giantstep is also focused on realistic real-time digital humans. It started R&D in 2018, and "now, we're in the era of the Metaverse, where the development of the virtual human (metahuman) is very popular, and there are many companies challenging the technology," says Sungkoo Kang, director at GX Lab, the internal R&D division of Giantstep, who was the lead on an application called Project Vincent (see "Project Vincent"). "However, back when we started, even the term 'Metaverse' didn't exist, so for a small team like ours from a midsize company, it was not an easy decision to put a lot of money into developing such technology."

Giantstep GX Lab

It is the studio's belief that realistic virtual humans will have enormous potential and practical value in the near future, followed in the far distant future by robot development. To this end, the studio started Project Vincent so it could see the possibilities and develop the underlying skills needed for this tech.

In the case of offline rendering, such as movies, artists and technologists can throw a lot of manual labor and computing power at the problem to adjust final outcomes. However, when it comes to a real-time digital human, the difficulty factor rises to a whole new level. As Kang points out, every calculation has to be done in 1/30 of a second and produce a finished image.

"Usually the shape of the performing actor and the 3D model's face are different, so there is a retargeting process whereby the actor's facial movement is read, converted, and then applied to the shape of the 3D model. Even this process takes more than 1/30 of a second," says Kang. "If you extremely optimize the data, this could be possible, but we chose a better solution."

Giantstep GX Lab

As Kang stresses, AI is the most important part in the development process. While AI was an essential component in the creation of Vincent, it is not without limitations, particularly when it came to solving all of the retargeting. As a result, Giantstep developed a tool using a lightweight AI that solved the retargeting process discussed above in 1/30 of a second.

However, technology alone is not the answer. Kang agrees with Nichols, Hendler, and Roble in that the solution is also dependent on having great artists and technicians. When Giantstep first started focusing on digital humans, it set a goal to reduce its reliance on highly skilled artists and technicians. So, instead of modeling by hand, the studio opted for scanning and creating blendshapes by automated algorithms, rather than handmade production. "But, ironically, what we realized later was that had it not been for great artists and technicians, the look of Vincent would not have been possible, even with the best scanning service. The same was true for technical development, and we wouldn't have been able to choose the direction that we've gone."

BabyX

The ongoing research into realistic autonomous digital humans by Mark Sagar, co-founder of Soul Machines, started many years ago, and a significant milestone occurred at the Laboratory for Animate Technologies (Auckland Bioengineering Institute), which he started. That is where BabyX was conceived around nine years ago.

According to Sagar, by combining models of physiology, cognition, and emotion with advanced, lifelike CGI, the group set out to create a new form of biologically-inspired AI. BabyX was its first developmental prototype, designed as a stand-alone research project and as an expandable base to feed into commercial computer agents, enabling the researchers to explore human behavior models and create autonomous digital humans.

Sagar notes that a baby was the most appropriate metaphor for the project, because a baby is like a blank slate. “I wanted to build a computer that could learn, a teachable computer,” he explains. “So we started at the very beginning and looked at the absolute fundamentals of human behavior and learning.”

To this end, Sagar began collaborating with a developmental psychologist who was examining cooperation in infants. “I thought that’s the perfect fit here because we’re looking at the way people socially learn to cooperate and interact with other people — which is an absolute fundamental of human interaction and, therefore, should be a fundamental of human computer interaction,” he says.

BabyX’s physical appearance is based on Sagar’s young daughter at the time, who he scanned while she slept. Initially, the physics-based model was driven by muscles, similar to a VFX system. Later, those muscles were driven by the brain model, which has been under development for eight or so years now. You see, BabyX is different than other autonomous digital humans — she has a digital brain.

LOOK AT HER NOW!

The first simulation BabyX in terms of her appearance and brain development was at six months old, and periodically, the researchers would update the models, stopping at a year and a half, since that is the point when learning explodes in actual children. The group is studying human caregivers interacting with children at progressive ages from six to 18 months, examining developmental milestones over that period, aiming for 18 months in their simulations. This stage spans learning new words and playing games like peekaboo.

BabyX

“We should be able to replace the real child with BabyX and have the human interact with her as they would a real child, and be able to teach her in the same way,” says Sagar. He notes that in the next few months, BabyX will be interacting with real parents and they will be scoring her in the same way they would in human-to-human interactions. At that point, he adds, BabyX should be able to move seamlessly between tasks and understand context.

Sagar points out that the goal is to build a system that is generally more intelligent than that achieved through standard machine learning. Instead of creating a statistical black box, Soul Machines is building a cognitive architecture and models that simulate fundamental functions of the human brain, and using it to generate the animation. “This is more about creating artificial general intelligence,” he says. As a result, the group is using a known template for the work, the human brain, and will build a system that a person can interact with, just as they do an actual being.

“This will be a relatable intelligence and will work in the same sort of way that we do. When we interact with other people, we watch their behaviors and form theory of mind and try to guess what others are thinking. So, when we cooperate with them, we’re able to have this constant feedback loop,” says Sagar. “What you see in the digital human’s face and its behavior is reflective of real cognitive and emotional processes, just as they would in an actual person, leading to more intuitive and better rapport and feedback from the digital human.”

In essence, what Soul Machines is building is an artificial nervous system that can control any creature.

Moreover, BabyX has emotions, memories, and can explore, learn casual patterns, and have goals and make plans, in addition to other skills. BabyX runs on a typical computer and “sees” through the computer camera. She can also hear a person through the microphone, and feel objects and humans through a touch screen or computer touch pad, or even a haptic device. BabyX receives that information in real time and responds accordingly and appropriately. “We’re trying to make all of the power of face-to-face interactions happen between a human and the computer.”

For Sagar, the motivating factor for BabyX is looking at the very essence of animation. Currently, some companies use animation loops to generate animation or behavior. Instead, Soul Machines is trying to motivate the behavior, so the behavior is driven meaningfully — the emotional context and semantic context affect its next actions. As a result, every movement of BabyX is happening for a reason.

BabyX

“We feel it’s important to really get the fundamentals right. For those doing it the other way, their costs are going to suffer combinatorial explosion because they’ll have to keep making special cases for all the missing parts, and they’ll keep going and going,” says Sagar. “We want digital humans that have much more flexible behavior and intelligence so they can work in all kinds of different cases.”

Sagar continues: “We’re looking at what it takes to create digital humans that are completely autonomous, that are motivated, curious, appreciate beauty, and things like that, the types of things that drive people.” To make the problem tractable, Soul Machines is building models of memory, emotion, and cognition in a modular fashion — akin to a LEGO system. A key part of the architecture is how emotion and cognition are intertwined, Sagar notes.

Some may say it takes a village to raise a child, but in the case of BabyX, it involves research in key fields, such as advanced CGI, biologically-inspired cognitive architectures, neuroscience, cognitive science, developmental psychology, cognitive linguistics, affective computing, and more. And it’s clear she is one smart digital human!

Soul Machines

Another company making very significant advancements in the area of autonomous digital humans is Soul Machines, with headquarters in San Francisco and R&D in Auckland, New Zealand. In fact, its co-founder, Mark Sagar, spent a number of years building realistic CG human characters for various blockbuster films while at Weta Digital and Sony Pictures Imageworks. Later, he started the Laboratory for Animate Technologies at the Auckland Bioengineering Institute, where he focused on bringing digital human technology to life, pulling together mathematical physics simulations and combining them with computer graphics.

With a PhD in bioengineering, Sagar had studied how faces are animated, how they deform, how they reflect light, and so forth. Indeed, he spent time during his VFX career building systems that simulated actors' emotions and behaviors. But, it all centered on the surface structure. He wanted to take things further, having an interest in biomechanics, physiology, how the brain worked, and, ultimately, consciousness.

"I had always looked at those areas and the progress being made in artificial intelligence," Sagar says. "I was interested in what actually drives actors' performances and motivates their behaviors."

At the lab, Sagar began pulling those aspects together in an attempt to create truly autonomous, fully-animated characters that would have a proper brain model driving the behaviors of the character. "It meant building a character in the complete opposite way that it had been done before. It meant, basically, building a character from the inside out," he says. "I wanted to make digital characters that would see and hear the world, and be able to take that all in and learn, and to have a memory and react to the world."

According to Sagar, the work in the lab encompassed two aspects. One involved building the brain models to drive a fully autonomous character, which was called BabyX (see "BabyX," page 12). The other was using parts of that technology to generate facial simulations.

After four and a half years of working toward this goal, the lab expanded into the company Soul Machines. On the research side, the company continued the extraordinary work on BabyX, exploring a new way to animate characters from the inside out. On the business side, the company began looking at using these lifelike digital humans in various applications.

The culmination of this work is Digital People for real-world applications. The company's Digital People contain the company's patented Digital Brain that contextualizes and adapts in real time to situations similar to human interactions for enhancing customer brand experiences. Currently, Soul Machines offers three categories of products. The Human OS Platform features a Digital Brain (for AI learning resulting in appropriate responses) and Autonomous Animation for lifelike interactions. The Digital DNA Studio Blender, meanwhile, enables customers to scale their unique Digital Person, creating multiple looks and leveraging up to 12 languages. Meanwhile, with Soul Solutions, customers can choose among three service levels, from entry level to advanced, to fit their needs.

The technology used to drive Soul Machines' adult digital humans, or virtual assistants, combines elements of the brain model with other technologies, such as speech recognition and NLP (natural language processing) algorithms - technologies that companies such as Microsoft and Google are working on - and enable the creation of conversational models, albeit ones that are more curated, as opposed to resulting from pure intelligence. Sagar compares the two methods to an actor saying lines: the person can either recite them from a detailed script or can improvise on the fly. Soul Machines is working on both ends of that scale, depending on the application it will be used for.

In fact, the use cases for Digital People are vast, spanning many sectors including consumer goods, health care, retail, tech, call centers, and more.

Soul Machines is continuing development of its technology and will be releasing its Digital DNA Blender for creating different faces in the near future. It also will be releasing technology that makes the digital humans content-aware, for meaningful interactions - generated autonomously and live.

RT Digital Humans at Work

Roble, Hendler, Nichols, Kang, and Sagar all agree that the future for digital humans looks promising. Presently, they are being used in visual media, but as real-time technology develops further, such as text-to-voice and voice-to-face technologies, it will become highly customizable and its role will be further involved in our daily lives, predicts Kang. "Especially now, with the emergence of the Metaverse, the use of avatars will come our way very naturally. In the same way that heavy labor has been replaced by robots, it is expected that much of our emotional labor will be replaced in the future as well."

Indeed, future applications of digital humans is a wide-open topic. At Giantstep, they're most interested in the Metaverse, inspired by sites like Zepeto, where social networks and virtual communities come together with custom avatars. However, the process remains difficult, and it still takes considerable time and effort to generate a high-quality digital human. But once the production is automated, Kang believes it will open the door to this kind of customization and diversity.

Digital Humans

Nichols is unsure whether people will want avatars that look realistic or not. But, realistic digital humans have big potential as research tools, to understand real humans better. For instance, PTSD research has found that avatars can break down barriers for better communication; the same holds true for those with autism.

"For consumer experiences or entertainment, applications will be limited only to the creative minds working on behalf of the world's biggest brands to create unique consumer experiences," says Kang. "We could have digital human influencers, actors… whatever can be conceived."

Giantstep also is exploring development of a platform for learning AI networks that can help project management, such as distributed processing for learning, progress control, and confirmation.

Digital Humans

Since releasing Vincent, the studio has received positive response from the industry and has received quite a few inquiries from brands, advertisers, and entertainment content creators interested in exploring opportunities. One of them is SM Entertainment, the largest entertainment company in Korea, which represents some of the best-known K-pop groups and artists, including aespa. "With aespa, we've helped create virtual avatars for each member of this new female K-pop group. SM Entertainment is redefining entertainment as we know it by extending their audience reach through the convergence of the real and virtual worlds," says Kang.

Moreover, since Vincent, Giantstep has streamlined its production pipeline and increased efficiency in creating digital humans. It has hired more artists and technicians who can improve the quality of its work and further expand its capabilities. "Without Vincent, the ease with which we were able to make the concept around aespa work wouldn't have been possible," Kang adds.

Digital Humans

For Good or Otherwise

The technologies discussed here can invoke fear and concern among the public. "But, ultimately, we're all, as humans, adjusting to what's new and possible, and reconfiguring certain things about what we believe and what we understand. And I think there's a lot of good and benefits to be seen from these technologies," says Hendler. "We often don't talk enough about how it can be helpful and beneficial."

For instance, the subject of deepfakes has become a fascinating, if not controversial, one. Technically, some people are defining deepfakes as taking someone else's likeness and superimposing that on to someone else, and not necessarily with their permission, says Nichols. And that is being done through machine learning. "Technologically, it's a very straightforward process," he adds. "Many people question whether this technology could ever be used for anything good."

The answer is yes, it can be. At the spring 2021 RealTime Conference (RTC) focused on the Metaverse, presenters in a session on digital humans and virtual agents discussed the use of technology similar to that of deepfakes in a documentary called Welcome to Chechnya. In the film, activists and victims relay horrific stories of persecution by their government and families while fighting for LGBTQ+ rights there, and it was imperative that their identities be masked due to potential repercussions of those interviewed. So, rather than blurring or pixelating their faces, the filmmakers used AI and machine learning to disguise them using face doubles - keeping the individuals safe, while retaining the all-important human angle of the documentary.

"This proved to me that just because you don't understand a technology or you see a bad example of how it's being used, that doesn't mean it's a bad thing," says Nichols.

As for the future of Douglas, "while it sounds like we're just madly experimenting and trying to trick people by creating a realistic person who you can't tell is not real on a Zoom call, that's not our aim here," Hendler points out. "Our aim is to create a human that you can have an emotion connection with, that you can interact with, that you know is not real, and that person is able to do useful things for you."

Hendler predicts the DHG is within a year from being in that position - a person, character, or creature that you can have a connection with that understands you and is a very useful entity to have around, and does a variety of useful things. In this regard, some possible uses for technology like Douglas is as a virtual concierge at a hotel, museum, or airport to provide information in a more interactive way, or even as a companion.

In fact, Hendler sees the research into digital humans as having an effect on computer games, which he predicts will feature exceedingly realistic human characters in terms of looks, speech, and interaction. He also foresees more applications in films, particularly in terms of aging and de-aging actors, resulting in actors playing themselves in a range of roles through time. He does not see this technology replacing actors: While it can make them older or younger, it is not creating a performance.

"We don't even know what will end up being key products. We just know [the capability is] coming and people are interested. It's evolving, and we are along for the ride," says Hendler.

Is the Future 'Uncanny'?

There is a wide measure among the experts interviewed here as to whether or not we have crossed the uncanny valley. Nichols believes we crossed it a while ago - "taking a lot of work to do it and a lot of people being very smart about how they approached it." He points to Digital Domain's work on Benjamin Button, which was accomplished prior to machine learning, instead relying on a lot of human time and effort, particularly in the trial-and-error stages.

"Most people didn't realize it was a digital human for the first third of the movie," Nichols says of the main character developed at his former employer, Digital Domain. "That is a sign, at least in my mind, that it was well done. So, I think the uncanny valley has been crossed a few times. If you want to completely fool someone, then yes, it takes a lot of work to get there, but it's possible to do."

Interestingly, Nichols finds that nowadays people generally seem to tolerate digital characters that are not very humanlike at all, accepting avatars that fall within the uncanny valley, especially in games. "People don't seem to have a problem with them like we once did," he says. And while games continue to push in the direction of realistic digital humans, the genre has yet to cross the uncanny valley.

Hendler, on the other hand, is more conservative in his assessment and doesn't believe that applications like Douglas have actually crossed this chasm. Yet. Rather, he sees them as "managing" the uncanny valley. "I would say we've actually stepped further into the uncanny valley than we have before, but I don't think it's going to be solved anytime in the next three to four years," he adds.

Sagar, however, has a different opinion, believing there are multiple uncanny valleys to cross. Indeed, some entities have crossed the first, achieving realistic-looking digital humans, with the next valley encompassing appropriate behavior. "There are so many combinations there that you have to get right," he says. "So, the model may look real and speak in a realistic way, but, is it talking about the same thing as you are, or is it completely random? If so, it can be quite disturbing and frustrating because, effectively, you've got a digital human that's not following what you are saying or is not acting appropriately."

While these experts may have differing views on the current state of digital humans, they all agree that in the not-so-distant future they will be part of our lives.

Karen Moltenbrey is the chief editor of CGW.