The ideal motion-capture system for filmmakers and game developers is easy to describe. It’s invisible, interactive, and accurate.
"A Holy Grail is to be able to capture a performer, face, and body, on set without infringing on principal photography, without an additional setup," says Debbie Denise, executive vice president of production infrastructure and executive producer at Sony Pictures Imageworks. "Directors want to employ the best actors they can; they don’t want to have to worry about who is playing that character. And, they don’t want to be encumbered by the needs of visual effects."
That Holy Grail is tantalizingly close. For such animated films as The Polar Express and Monster House, Imageworks’ in-house ImageMotion system worked with Vicon cameras to capture audio, face, and body performances from multiple actors, but on a carefully calibrated stage. For the live-action film Pirates of the Caribbean: Dead Man’s Chest, Industrial Light & Magic moved motion capture outside the box and into principal photography on location (see "Yo Ho Ho!," July 2006, pg. 16). That studio’s I-Mocap system uses off-the-shelf video cameras to capture multiple actors anywhere, in any conditions, albeit only body motion, not faces.
Studios are advancing the state of the motion-capture art in other ways, as well. "We have two main goals," says Matt Madden, director of R&D at motion-capture facility Giant Studios. "One is to shoot in any condition quickly, without interrupting the flow of principal photography. The other is interactivity." Giant handled motion capture for Weta Digital on The Lord of the Rings trilogy and King Kong, and did on-location mocap for The Chronicles of Narnia: The Lion, the Witch and the Wardrobe.
Traditionally, studios have used motion-captured data to help animators perform CG creatures that resemble humans or animals, for motion cycles used in crowd simulations, and to create digital doubles. For the most part, data acquisition happened on motion-capture stages during postproduction, with dancers, stunt people, and, occasionally, animals acting the part of digital doubles or CG characters. Sometimes the director would be involved in the motion-capture sessions, but usually not.
These days, that’s only part of the story.
Some of the biggest innovations in data acquisition center on how studios are integrating "motion capture" into the filmmaking process and vice versa. In visual effects studios and service bureaus, motion capture is now a synthesis of camera tracking, matchmoving, and virtual sets. "It’s all about figuring out how things are moving in 3D," says Seth Rosenthal, CEO of Tweak Films and former head of the motion-capture unit at ILM. "They’re all different ways of looking at the same problem."
Indeed, even real-time motion capture itself isn’t new. ILM used a real-time system for the 2001 film The Mummy Returns. During postproduction, ILM applied motion data captured from Arnold Vosloo in real time to a CG creature in matchmoved plates shot earlier, all of which helped the director frame a tricky shot. Now, thanks to fast hardware, HD cameras, and efficient software, motion capture is becoming part of preproduction and production, rather than only postproduction. And, it’s even helping speed postproduction.
Moving On Up
For example, Vicon’s mocap service studio, House of Moves, recently brought a small MX motion-capture system onto a set and incorporated the film camera for principal photography into the motion-capture process. The goal was to apply an actor’s motion in a close-up shot to a photorealistic CG human that didn’t match the actor’s face.
On set, the House of Moves team captured the head and torso of a live principal actor and also tracked them in 3D. "The tracking markers were positioned differently on each shot so they weren’t evident in the film camera," says Gary Roberts, vice president of production for Vicon’s House of Moves, "but our cameras could see them."
To set up the shot, House of Moves first tracked the camera in footage provided by its client, using Boujou from Vicon’s sister company, 2d3. Because the mocap team made certain the 3D survey was in the same coordinate space, the mocap data lined up with the survey and the tracked camera. "We could lock the mocap data to the principal film camera," Roberts says. "It sounds simple, but it saves a huge amount of time otherwise spent tracking the head and the camera in post. That’s got everyone excited."
Facial capture for that project, which can’t yet be unveiled, happened later at House of Moves with two systems capturing data simultaneously: Vicon’s MX optical mocap system and Mova’s new Contour, an image processing-based capture system; the two systems were genlocked together. "So for every take, we acquired motion from our system, motion from Contour, and motion from the reference cameras," Roberts says.
At Giant Studios, a motion-capture project for a film that combines CG characters and live action is also pushing the state of the art. "Everything has to be available in real time," says Madden. "Characters, the environment, the camera, everything has to be presented at a high level—not just the data capture, but the real-time graphic performance, as well."
To help make that possible, the production company’s art department optimized scenes for real-time display. In real time, the captured performances streamed to Autodesk’s MotionBuilder and powered CG characters that performed in environments which were textured and lit to approximate the final render. "Here’s the difference it makes," Madden explains. "Say you’re shooting on a set with props, a crew, and reference cameras. You have a huge vehicle that a person has to jump off and then grab a rope. The rope may be attached to something that’s flying. You can see all that in real time. You can put a character on a vehicle, make the vehicle move, and put the camera anywhere you want. And, you can see different scales simultaneously."
For example, if two six-foot humans were performing a 50-foot character and a three-foot character, the system could capture both simultaneously yet display them as different-sized CG characters. "We scale up the space," Madden says. "It would be impractical to drive the movement in different scales because the bones wouldn’t match."
This means the director can make changes on the fly—move a prop or alter the set, change the lighting, and so forth. "All of that is available now to be manipulated interactively," says Madden. "It demands more work up front to set up the characters and scenes properly, but it pays off on the back end."
Similarly, at House of Moves, a game client created a cinematic using real-time motion capture with camera tracking. "It’s the first time I’ve seen a game client hire an independent director to help direct cinematics," says Jon Darmush, Vicon general manager. For the cinematic, House of Moves created the fuselage of an aircraft built to within 1mm of the actual aircraft. The CG version needed to line up perfectly with the prop so that CG characters driven by data captured from actors would interact with the environment even though they had different proportions.
"Everything lined up perfectly—the actors interacted with the environment and the CG characters interacted in the CG world," says Darmush. The director rehearsed the shot and used the real-time 3D as previz. When he finished directing the motion capture, he played it back on set in 3D side by side with video from reference cameras and audio, changing camera angles and framing in the 3D version to be certain he had captured what he wanted.
And, for a film project at House of Moves, the director, the director of photography, and an editor all took part in the motion-capture session. "The director was out with the six actors being motion-captured," says Darmush. "He could get immediate playback with a rough capture to choose camera angles, or we could quickly process the data through our farm. The editor was on-site, and we were streaming real-time CG into an Avid system. Then the director went back onto the motion-capture stage with a camera that had the master shot, the 3D performance, in his camera. He replayed the performance and concentrated on the camera move."
In addition to using motion-capture systems for previsualization and for directing the action during principal photography on live-action films, live-action directors with no experience in animation are using real-time systems to create animated films.
One of the most extensive uses of a real-time system was by Animal Logic. That studio used Giant’s system to help animators perform penguins in the animated feature Happy Feet (see "Happy Feat," pg. 12) by motion-capturing dancers. As the dancers performed on the motion-capture stage, director Gary Miller could see their movements transferred onto the penguin stars of the film. "Others have used real time for a few minutes of effects work or selected shots, but Happy Feet was groundbreaking in that it was the first to leverage real time throughout production," notes Madden. "This is the year when real time is starting to proliferate out to the bigger projects."
Indeed, at House of Moves, Threshold’s animated feature Foodfight!, in which grocery store products run amok, is being acted out on stage. CG characters perform in real time as director Lawrence Kasanoff runs the actors through their motions. "Larry [Kasanoff] is actually animating on the stage," says Darmush. "He’s directing the characters, not the actors."
But real time isn’t for every director. One of the first directors to use motion capture for an animated feature was Robert Zemeckis, who directed Warner Bros.’ The Polar Express and produced Sony Pictures’ animated feature Monster House. Now, he’s directing the third performance-capture film, Beowulf, using the third generation of Imageworks’ ImageMotion system. With this system, Imageworks now simultaneously captures the facial expressions, dialog, and body motion of between 12 and 15 people working in a 25-by-25-foot stage. Imageworks doesn’t feed the data to CG characters in real time, though. Instead, the studio applies the data for selected takes to the CG characters later, and Zemeckis films the performing CG characters using real cameras that drive virtual cameras.
"Some directors want to set up camera angles in the motion-capture shoot and want data applied to characters in real time," says Denise. "Not Bob [Zemeckis]. It’s all about the performance for him." Even so, Imageworks is working with Vicon to develop real-time capability with their marker set.
Face of the Future
Anyone who has tried to create a digital human will tell you how difficult it is to avoid making creepy CG characters. Thus, in addition to moving motion capture outside, fitting it into principal photography, and making it interactive, many studios are concentrating on improving facial capture. In fact, one criticism of The Polar Express was the lack of expression in the characters’ eyes. To solve that problem for Beowulf, a process known as electroculography, which is primarily used by ophthalmologists, will capture the movement of the actors’ eyes and eyelids with electrodes, or sensors, placed around their eyes.
Also moving facial capture forward is Mova’s new Contour system, which caused a stir at SIGGRAPH this year. The system uses 1.3-megapixel cameras to capture movement by following and evaluating patterns in special phosphor makeup applied to the performer. Kino Flo fluorescent lights aimed at the subject flash between 90 and 120 times per second; when they don’t flash, the phosphorescent makeup glows, and digital cameras capture the random patterns, which, when correlated, produce a moving, high-resolution 3D model. At the same time, texture cameras capture videos of the lighted surfaces for texture maps. The system is particularly appropriate for capturing soft tissue—such as faces.
To make the dense 3D mesh useful in animation, Mova can track surface vertices. The 3D mesh happens automatically, but the tracking doesn’t. "The clients specify where they want the points tracked, and we run the tracking," says Mova founder Steve Perlman. "We can turn around the data in a week." In addition to high resolution—Perlman claims they can track more than 1100 points, far more than possible using reflective dots—animators can ask for changes in the tracking points later.
The trick is getting the data into a form that’s helpful to animators so they can use a mixture of motion-capture data and keyframing. Imageworks has developed its own system to retarget facial animation, as did Weta to retarget Andy Serkis’ expressions onto Kong’s face in the 2006 Oscar-winning film for visual effects. Softimage and Mova expect Face Robot to provide that solution for Contour. "Face Robot will have direct import capability for Contour," says Perlman, "not only for tracked points, but also for the surface geometry."
State-of-the-art in-house systems at many visual effects facilities also target facial capture. ILM is not ready to reveal details of that studio’s new facial capture system, but two other studios, Double Negative and Pendulum, are.
Double Negative has been developing its facial animation system for five years and working with Image Metrics for the past two years. "Our system is basically an image-processing system that tracks the shapes of the eyes and mouths and a few key points," says visual effects supervisor Paul Franklin. "It’s a collaboration between us and Image Metrics. They came out of the computer vision community, so rather than tracking targets, their system is about giving the computer an understanding of what it’s seeing."
The system can work with one camera if the actor looks in only one direction; otherwise, more cameras provide more flexibility. "We recently did a shoot with six Sony HD cameras for a project we’re working on to capture a performer moving around in an unrestricted fashion," says Franklin.
The Image Metrics side of the system analyzes the captured shapes and compares them to measurements of facial movement on more than 200 people, according to Franklin. Then, it extrapolates the muscle groups that created the shapes in the actors’ eyes and mouth. The process works on the principle that any time you contract a face muscle, it affects the eyes and mouth. "You can’t contract only one muscle," says Franklin.
Thus, Double Negative receives animation curves for 28 groups of contractions. "We fit that data to our model," says Franklin. "One of the neat things is that we can map it onto characters that look very different from the actor or onto digi-doubles."
The motion-capture crew at Double Negative relates that data to a facial animation rig that is informed by the system of facial expressions known as FACS. "As a result, we get detailed facial capture that plugs straight into the animation rig that our animators normally use to do keyframing," he says. "They can leave it as is, modify it, or create a new performance on top."
To get full-body capture, Double Negative runs the facial system with infrared cameras and systems from Motion Analysis or Vicon; to capture only head movements, they use photogrammetry tools through a camera-tracking process. So far, Franklin has captured performers only on a sound stage, but there’s nothing to stop the studio from using it outside as long as the lighting conditions are fairly even.
For its part, Pendulum has taken a route that more closely resembles Imageworks’ solution to capture useful facial expressions. The trick is that its system is very fast.
Co-founded by Robert Taylor and Michael McCormick, who were founding members of Giant Studios, Pendulum’s primary focus is in commercials, music videos, and game cinematics. They start with 95 markers to capture data from a performer at House of Moves, with which they have built a pipeline relationship. Then they move that data into their own software, Stretch Mark. In Stretch Mark, animators use between 12 and 15 "handles" to drive the motion-captured data.
"Stretch Mark doesn’t try to reproduce where the dot was on the actor’s face," Taylor says. "It tries to reproduce its relative position. Because we’re using blendshapes, not muscles or points, our CG head can be cartoony or realistic. We’re sampling differences, not shapes. We can even give the system an exaggeration multiplier and turn the multiplier up and down almost in real time."
Michael Hutchinson, the lead developer, gives the technical explanation: "It’s a solver that determines coefficients for morph targets. The power is that it isn’t based on anatomy, only what you give it."
McCormick adds, "Our original objective was to be able to re-create a digital actor. The speed we can apply data has opened up new areas. The biggest investment is in creating the model and blendshapes. Once that’s done, we can go from the motion-capture stage to having data for animators within a couple of hours."
But while these studios are pushing the high end and moving toward computer vision-based solutions that take advantage of HD cameras, one technology company has taken a different track. PhaseSpace has created a fast, high-resolution, inexpensive motion-capture system that uses active markers. Each marker has a unique ID, which makes finding x and y coordinates quick and accurate. And, occlusion isn’t a problem.
Digital Domain animator Dan Taylor used the PhaseSpace system to create a test for a high-profile feature on which the studio is bidding. "The thing I like about the system is that it’s priced right," he says. Taylor notes that for most studios, building an in-house motion-capture facility is not an option. "Most systems can put the sticks onto the points," he says, referring to transferring motion-captured data to an animation skeleton. "But motion capture is a craft. Motion-capture studios like Giant have the people and software to clean up data. They also have expertise in offsets for targeting motion to characters that don’t match human proportions. But, we could use PhaseSpace for rehearsing, so we don’t burn dollars at Giant. And for pickup shots in production. Or for hands. Dollar for dollar, it’s worth it."
Each system has its advantages. ILM’s image-based approach removes any restrictions on the director during principal photography, allows the director to work with performers being motion-captured alongside live-action actors, and works with standard high-def video cameras, but it requires a lot of handwork after the capture to derive data that animators can use. Imageworks’ system is not yet real time and doesn’t travel; however, the system captures the entire performance, and the studio is pushing toward on-set motion capture.
Meanwhile, House of Moves has begun integrating Boujou’s tracking software and Peak Performance’s digital video-based capture software to move toward real-time, through-the-lens, high-quality data capture. Similarly, Giant is pushing its on-set motion-capture technology and moving toward a facial system that works in what Madden calls an "interactive paradigm."
So, with all this in mind, is it possible to predict what’s in store for the future? The answer is simple: "Better results with less input," says Rosenthal.
Barbara Robertson is an award-winning writer and a contributing editor for Computer Graphics World. She can be reached at BarbaraRR@comcast.net.