Turtle Talk
Issue: Volume 37 Issue 4: (Jul/Aug 2014)

Turtle Talk

The phenomenon started, so the story goes, as a self-published parody of Marvel Comics' mutant teenagers and samurai-­style superheroes, a manga-­style Frank Miller superhero, and Dave Sim's anthropomorphized aardvark. Young Artists Kevin Eastman and Peter Laird whirred those comic-book heroes into a potent mash-up that became the Teenage Mutant Ninja Turtles.

That was 1984. Since then, the Ninja Turtles or TMNT, as they're often called, have appeared as action figures, animated characters in cartoon and anime-style TV series, comic strips, comic books, video games, a theme park ride, a live-action television series, and four feature films.

Here's the Ninja Turtle backstory: A canister filled with toxic ooze spilled onto four baby turtles in New York City. A Ninjitsu's pet rat, forced into homelessness when his owner died, rescued the baby turtles and carried them into a sewer. A bit of ooze got on him, as well. It gave all five intelligence, greater size, speech, and the ability to walk upright. Splinter, the rat, trained the turtles in martial arts, and when he found a Renaissance art book in the sewer, named them Leonardo, Michelangelo, Donatello, and Raphael. From thenceforth, they battled all manners of evil, while trying to remain unknown.

Actors wearing costumes and animatronic turtle heads played the anthropomorphic reptiles in three live-action films, and animated versions of the superheroes starred in a 2007 CG animated feature. So, what's next?

Live action with CG characters. Produced by Michael Bay of Transformers fame and directed by Jonathan Liebesman (Wrath of the Titans, The Killing Room), Paramount Pictures' Teenage Mutant Ninja Turtles features the four superhero turtles, their sensei rat Splinter, and their nemesis Shredder, all CG characters.

Artists at Industrial Light & Magic worked on 428 of the film's total 726 visual effects shots, including 396 with animated characters. Working with ILM was strategic partner BASE FX as well as Hybride, Image Engine, Atomic Fiction, and Virtuos. Tippett Studios artists created young CG turtles.

Animators started with data captured on location and in studios using the latest evolution of ILM's proprietary iMocap system for body capture and Muse, the studio's new state-of-the-art system, for facial capture.

Each CG character had an actor on set who performed the narrative scenes and a counterpart stunt actor for the action scenes. Alan Ritchson is Raphael, Noel Fisher is Michelangelo, and Jeremy Howard is Donatello. Pete Ploszek played Leonardo on set and Johnny Knoxville provided his voice. Danny Woodburn played Splinter on set and Tony Shalhoub provided his voice.

Animators and visual effects artists sent the ninja turtles into explosive action shots.

The CG turtles look buff, tough, and every inch a superhero. They wear their shells like shields. They stand about six inches taller than the actors who portray them. They practice martial arts. And, they talk. So, how do you make a turtle talk?

"It starts with a science project," says Pablo Helman, visual effect supervisor. "We had to extract high-fidelity data with a lot of subtlety from an actor's performance. Then, we had to reinterpret it with artistry to end up with a performance that is relevant to the story we wanted to tell and fits with a creature whose design is pretty different from the actor."

Performance Capture

On location, the actors playing the CG characters wore gray suits printed with unique markers that ILM's iMocap system  would later recognize and translate into body performance data for the animators. For facial capture, the actors wore snug helmets molded to a 3D scan of their heads and fitted with dual HD cameras. One camera pointed toward the left side of the face and the other toward the right. Painted on the actors' faces were 128 dots that, once recorded, ILM's Muse system could track. A battery pack powered the system, and a hard drive collected the data. The actors wore both on their backs.

"The head rig also had lights," says Robert Weaver, associate visual effects supervisor. "So we could capture in a variety of situations - even on a rooftop in the pitch of night." The lights were eight tiny LEDs over each camera.

Ilm’s new muse system translated facial motion capture into data animators could use to perform turtles in dialog-driven scenes.

Because the turtles would be about six inches taller than the actors, the actors stood on six-inch platforms or wore three-inch platform shoes for dialog-driven scenes.

"When they wear the shoes, we put Ping-Pong balls three inches above their eyes so everyone would know where to look for eyelines," says Tim Harrington, animation supervisor. "We used the height for scenes when they're standing around talking to each other. In the big action scenes, they are in 'turtle space,' so the stunt actors didn't need it."

Stunt actors wore the iMocap suits but not the facial-capture rig, and all the actors playing turtles had big foam cushions to approximate a turtle shell on their backs. "If we didn't dress the stunt performers with the foam cushions, they'd move without regard to what they were meant to be," Weaver says.

Filming the shots were two main stereo cameras, three witness cameras, and the two cameras on the head rigs on each of the five actors playing the CG characters.

"On a good day, we'd gather 1tb of data per actor per day," Helman says. "That allowed us to find all the subtleties." But it meant a total of between 4tb to 6tb of data per day. Dealing with all that data was a hefty job.

A 14-person team from ILM managed the performance capture. "Our footprint on set was pretty big," Helman says. "But we needed that many to capture the performances simultaneously - manage the logistics, put the helmets on, recalibrate the cameras, check the marker makeup, and make sure the wireless data all transmitted. Doing facial and iMocap performance on set is challenging."

Raw Data Matters

The dual cameras mounted to the head rigs filmed the markers on the left and right sides of the actors' faces, taking RAW footage compressed on the fly. Later, layout artists at ILM "solved" images from these cameras, from the witness camera footage of the iMocap suits, and the position of the stereo cameras used to film the shots. They turned the dots in the images on the actors' faces and patterns on the iMocap suits into useful data.

"Because Muse is new, we had between six and eight people from R&D helping with the facial solves and with the pipeline," Weaver says. "Since the R&D people were making the system better as we went along, we decided they might as well be part of the process."

Prior to the on-set motion capture, the team had the actors perform expressions while being scanned with Disney Research's Medusa system. The Muse team decomposed those scans into components within Fez, ILM's facial animation system that animators used to re-create the expressions. After facial motion capture, layout and R&D teams triangulated the markers in the 2D images into 3D space, frame by frame, and then applied them to controls on those components in Fez.

"We also ran the data through a contour tracker to track eyelid, inner lip lines, and pupils," says Tim Harrington, animation supervisor. "All that data drives the face." At the end of the process, animators work with the same type of animation curves they use for keyframed animation.

Actor megan fox with the four turtle actors wearing facial mocap rigs and ilm’s delta suits. Ilm’s imocap system translates the suits’ printed markers into motion data.

The new system might sound similar to previous approaches, but there are important differences.

"Before, like others in the industry, we would capture a performance, decode it, create a moving mesh, and the keyframe animation would happen as a layer on top," Helman says. "With Muse, the animators access raw data; they don't manipulate a layer on top of the actor's performance. They have access to points they didn't have before." (See "The Way Muse Works" on page 10.)

Although having that access might seem valuable in general, the nature of this film made it necessary. It's an action/thriller, but it's also a comedy.

"This movie is made from jokes and smart dialog," Helman says. "The performances come from four funny actors sitting at a table improvising, with writers behind saying, 'Try this.' The final performance ended up being a combination of takes from all the performances."

And with four talking turtles plus a talking rat in the scenes, things became complicated.

"We shot for four months on set, then the director made his picks," Helman says. "For one shot, he might like a line from one actor on one take and the performance from another actor in another take. Then, there were revisions to the script. So, we'd do another capture of the same scene later at ILM. Sometimes we'd incorporate part of that, then go into another volume and re-do the capture. We captured performances in New York, Los Angeles, and at ILM in San Francisco. We had to put them together."

Directing CG Ninja Turtles

Thirty-seven-year-old Director Jonathan Liebesman has been a Ninja Turtles fan since childhood.

"I grew up with the cartoon as a child in South Africa," Liebesman says. "I was a fan of the first film and “The Secret of the Ooze." Because I have a brother, I was instinctively drawn to the camaraderie and strong sense of friendship that exists between the Turtles. The best thing about this movie is that we've retained the original charm and fun of the Turtles while staying true to their characters."

Teenage Mutant Ninja Turtles is Liebesman's first production that utilized motion-capture technology, but it will probably not be the last. "I love the technical side of VFX filmmaking, and TMNT allowed me to explore that passion," he says. "ILM set up an incredible pipeline with tracking suits and HD facial-capture technology that allowed us to shoot actors on any location and record their performances to be decoded and translated in anim. It was like being a kid in a candy store – being able use the technology I had only read about or seen in the Planet of the Apes films. It’s incredible to see how advanced the technology has become."

The technology gave Liebesman a new kind of directorial freedom. “Directing CG characters takes a different sort of imagination,” he says. “It takes a certain kind of faith while you are shooting for everyone to be confident that the final product will turn out to be what you initially imagined. But, looking at the film and scenes in a different manner can foster new, fresh ideas. And, if you come up with a new idea, it is easier to implement that idea, basically speaking, because so much of the film is finalized in post production. To have a partner like ILM and Pablo Helman was incredible. I would absolutely use the process again. The technology is getting better and better each day, which, in turn, increases the possibilities exponentially. The turtles look amazing!” — Barbara Robertson

Turtle Faces

To help transfer nuances from the actors' performances and make the facial expressions and lip sync on the CG characters believable, modelers sculpted the facial features from the actors into the turtles.

"That made the retargeting come across as natural and seamless as possible," Weaver says.

But, it wasn't easy.

"It took a lot of art directing to create the facial animation rigs for the turtles," Harrington says. "We had to make the actor's smile work with the physicality of a turtle's face, and make it appealing."

To check the quality of data transferred via the Muse system from the actor to controls in the digital character, the team superimposed a moving wireframe of the actor's face on top of footage of the actor saying his lines.

"When we were sure it was accurate, we copied and pasted the motion onto the turtle characters," Harrington says. "The same motion curves drove the turtle's face."

When the animators didn't have data captured from the actors, they would sometimes capture each other or keyframe the performance.

"We'd do pick-up shots on our motion-capture stage when we needed something we didn't have or wanted to do over," Harrington says. "A person swinging a punch maybe, or taking a couple steps. We used pretty much everything we could: keyframe, motion capture with iMocap and Muse, footage from witness cameras."

Stunt actors performed the physically demanding action sequences, but the story often called for stunts that were physically impossible, and for those, animators would keyframe the characters' performances.

Each turtle wears a different colored mask, and in addition, modelers and animators gave leonardo (leo), shown here, and his three brothers individual characteristics to help distinguish them.

"At the end of the day, I think the facial animation was about half motion capture and half keyframe, with Muse always as our starting point," Harrington says. "For the bodies, because the turtles were physically different from the actors - they look like bodybuilders - the data was sometimes useful and other times we had to re-work it. Physics is always a challenge when you have turtles doing superhero feats. The turtles are ninja masters."

Ninja Moves

Of ILM's character shots, 166 were with close-up dialog. Action dominates the rest. Fight Coordinator Garrett Warren provided the martial arts moves.

"Often when we work on an animated fight scene, we have a blank plate," Harrington says. "This time, we had Garrett Warren, one of the best fight coordinators in the business. We had real martial arts moves for our animated characters to do. If there was something we didn't get with Garrett, we captured animators who are into martial arts. And in one scene, when Mikey [Michelangelo] needed to deflect tranquilizer darts with a nunchuck, we found Bruce Lee shots we could use as reference."

The turtles move through sewers on their backs. Michelangelo has a rocket-powered skateboard. And in one action sequence, the turtles luge down a snowy mountain in an entirely CG environment that included semi-trailer trucks, Suburbans, and an out-of-control Humvee.

"The interaction of the snow is pretty spectacular," Weaver says. "We had a base snow layer, turtles sliding on their shells, cars spinning around and tumbling, chunks of snow and ice being kicked up. It was quite a process hitting the aesthetic the director wanted."

The effects team used tools within ILM's proprietary Zeno for giant volumes of powdery snow and clumping snow, Side Effects Software's Houdini for streaky particle effects, and ILM's Plume for smoke and steam in these and other shots. Compositing was through deep files in The Foundry's Nuke. Pixar's RenderMan handled the rendering.

"A year and a half ago we moved to [Solid Angle's] Arnold and did a lot of hard-surface modeling," Weaver says. "Arnold was great for Star Trek, and they've made some advances since we purchased the software. But on this show, we used RenderMan for everything. Pixar has made some pretty staggering leaps in raytracing efficiencies. And, we hit a new level of realism with our scattering and refractions. When the camera is inches away, the turtles' eyes look photoreal. I'm really proud of how that came out."

There are two big martial arts sequences in the movie. One is between the sensei rat Splinter and the Shredder, a character covered in blades. It's set in the turtles' sewer-based underground lair. "Splinter is four feet tall and Shredder is six feet," Harrington says. "So it was big versus little. We found cool ways to use Splinter's tail and Shredder's blades."

The second big sequence, a dramatic fight atop the Conde Nast building, takes place in Times Square.

"That was a change in location from where we originally shot," Weaver says. "We had anticipated needing to replace portions of the buildings for the fight, but it ended up in a place we didn't shoot. So, I'd say that environment is 98 percent CG. It's interesting. As the years go by, I find we typically use less and less of what we originally shot."

Donatello (donnie) in the purple mask follows a blurry michelangelo (mikey). In dialog-driven scenes, actors playing the turtles used lifts to stand turtle high. In turtle-driven action scenes, that wasn’t necessary.

With that in mind, the crew had documented the area in case the story changed. "Part of our job is to give the filmmaker flexibility," Helman says. "No one wants to hear that we can't do an insert. We have to make sure all these things happen."

In addition to his role as visual effects supervisor, Helman did four weeks of second-unit directing. "We built Times Square from scratch using two weeks of second unit plate photography," Helman says. "It's incredible. We have cycles of people walking around taking pictures, looking at the fight 54 stories up. We have the turtles looking down from the scaffolding, almost falling. You can't tell it's all-CG."

Looking Ahead

Helman, who has been part of the VFX industry since 1995 and a VFX supervisor since 2001, is excited about the impact of the new motion-capture system on ILM's work in this film.

"This is a great way for the visual effects industry to be embedded deeper into the production side of filmmaking," Helman says. "We aren't just exploding something. The visual effects are carrying the movie, telling the story from an emotional point of view. In this movie, we collaborated with the director, writers, creature designers, and the production designer. We were part of discovering these characters. The reason we all came to visual effects is to do filmmaking. We like making movies, telling stories. This gave us a great opportunity to do filmmaking."

Helman provides an example from the film. "There is an emotional scene at the end in which we understand what it's like to be four outcast brothers who don't understand why things are the way they are," he says. "We were part of that. That's why this system is important."

The Way Muse Works

Michael Koperwas, creature CG supervisor at ILM, and R&D Lead Kiran Bhat led teams of artists and researchers who created and refined ILM’s new facial-­capture system.

“There is a lot of action in this film,” Koperwas says. “The actors would even be jumping on a hydraulic rig moving up and down. Again, we partnered with VideoHawks. They have a solid, 30 fps, 720p, dual-­camera rig and, equally important, good helmets that are comfortable for the actors.”

Before the on-set capture, the team used Disney Research’s Medusa system to do 4D scans of the actors (3D plus time). “We had the actors do expressions, smiles, frowns, and so forth, and then used Fez, our facial animation system, to break the expressions into animator-­friendly components,” Koperwas says. “To give animators more control, we didn’t restrict ourselves to FACS expressions.”

The team would search through video of the scanned faces, pick the best example of a smile, a squint, and other expressions, select the Medusa-­created mesh associated with each, and bring that mesh into Fez. “We wanted the best, most isolated example of an expression, like a smile,” Koperwas says. “The actor may be squinting or have raised eyebrows, but we want only that smile. We extract it from the mesh, solve it (translate it to animation curves), and put it on sliders in Fez. The animators can make the smile bigger or smaller and open or close the lips. Fez automatically corrects it to keep it on model.”

The Muse team then transferred the actor’s expressions (broken down into sliders) onto the corresponding turtle’s facial rig.

“If the actor smiles, the turtle smiles, and it’s the same amount of smile,” Koperwas says.

That set the stage for applying facial expressions captured from the head-mounted cameras as the actors performed on location and in motion-­capture studios.

“We’d get footage from the plate cameras with the selected frames,” Koperwas says. “This is a comedy, so the editor cut the dialog smoothly for maximum comedic effects. They might have captured the first line of dialog in New York, the second at Digital Domain in LA, the third at House of Moves. We had to piece together the motion from those capture stages.”

And that’s why the new system became important.

“The first line of dialog might have been delivered with a happy smile and the second with a neutral face,” Koperwas says. “We’d get cuts with lines of dialog and associated video, and we’d see faces jumping from line to line. We had to blend between them. Because the way Muse works, we could copy and paste keyframes of motion data and the animators could massage the in-betweens. This wasn’t possible before. Now, it’s not only possible, it’s easy.”

Each actor wearing the head-mounted cameras had 128 dots painted on his face. At ILM, layout artists calculated the position of those dots in 3D space in each frame by using the cameras’ positions relative to the helmet and the dots captured on the left and right sides of the actor’s face. The Muse team applied those dots to the facial rig in Fez. Then, they fitted the dense data from the Medusa scan onto the 128 points.

“Medusa gives us some arbitrarily high number of points, maybe a million points per frame, for high-quality reproductions of the skin,” Koperwas says. “It tells us where the wrinkles are. We fit that data onto the Fez rig.”

When dots captured during the actors’ performance move, the face moves – but not directly. And this is a key difference between Muse and previous systems.

“We fit a set of controls,” Koperwas says. “As the dots move, we translate that into a set of controls the animator can edit. Previously, we’d use the sparse dot pattern to deform the face, which gave us nice results, but it was difficult to edit. Basically, we were working with an animated sculpt, a different shape on each frame. We could add to it or remove something, but we couldn’t really change it unless we re-sculpted it.” Now, the data is on controls, on sliders instead, and animators can easily edit the motion.

The other key difference is that the data came from two cameras, not one. “You can fit the 2D data from one camera to a facial animation rig, but it isn’t as 3D accurate,” Koperwas says. “The slightest difference can have a big impact.”

The team felt the impact of the Muse system in several ways.

“It gave us the flexibility to stitch together different pieces, from those captured in broad daylight on location to ones captured on a soundstage and have the same consistent quality of solve,” Koperwas says. “That gave the film­makers and animators the freedom to work as they wanted. They could find the best timing and the best moments from characters in different shots, and we could translate that into a system that didn’t give animators aneurysms.” — Barbara Robertson