Motion-Capture Mania
Issue: Volume 35 Issue 3 April/May 2012

Motion-Capture Mania

Break out the champagne, pop the corks, and dance a jig. After more than 20 years, motion capture has finally achieved mainstream acceptance by professionals in the entertainment industry and by consumers. Credit James Cameron for showing people the top of the game with Avatar, and Microsoft for bringing mocap to the masses. The 18 million people who have bought Kinect devices for Xboxes since November 2010 might not call it motion capture, but they know they can wave their arms at a computer character and it will wave back. Indeed, today you can find hundreds of enthusiasts, most using systems professionally, on the Motion Capture Society’s Facebook, LinkedIn, Twitter, and Tumblr pages.

“You no longer see people who don’t know what motion capture is,” says Bo Wright, director of Motion Analysis Studios. “People are well versed with what the technology does and how it is used. There are differences among the vendors, but the concept is standardized.”

Oddly, given all this acceptance, when you look at the industry from the outside, it seems much as it ever was. Vicon and Motion Analysis offer cameras and software systems for optical capture at the high end; Giant and House of Moves take those systems, add their own secret sauce, and offer high-end service; another group of vendors, notably PhaseSpace and Xsens, offer variations on the capture theme using active markers and accelerometers to capture motion; Image Metrics and Mova continue to offer facial-capture services; and nipping at all these companies’ heels are vendors offering lower-cost solutions, such as OptiTrack and iPi. There are a few new players, such as Digital Domain opening a performance-capture studio at the high end last year, and Microsoft bringing Kinect to Windows at the low end this past February. 

The House of Moves, which offers optical motion capture using as many as 300 Vicon T160s,
has developed a system that moves data directly into the Unreal game engine to let directors
see final characters moving with real-time capture data in final environments.

What was once solely a technology-driven industry now sees the creative community, which has become well versed in the possibilities, pushing innovation. Filmmakers, game developers, and animators have a variety of needs, embrace a variety of solutions, and are asking service bureaus and vendors to develop faster, more accurate, and more cost-effective solutions—in real time, please. Some call this commoditization. Others use the word “democratization.”

Words aside, while walk cycles and fight moves are still important motion-capture applications for game developers and for crowd simulations in films, the trend now is to push the state of the art of virtual production. Game developers are using motion capture to create cinematics in ways those in the film industry still cannot. Broad­casters are employing virtual set technology to produce new types of on-air graphics. Filmmakers have discovered performance capture, “simulcam,” and motion capture for shot planning, and service providers are finding ways to be flexible, mobile, and to provide facial capture with more detail than before. Animators who used to videotape themselves to test ideas have learned they can use a low-cost mocap system instead and apply their moves to the character they’re testing. In fact, there are so many trends and applications these days, it’s impossible to chronicle them. But, here are a few of the most interesting innovations we’ve spotted.

WYSIWYG Mocap for Gaming
The House of Moves, which is now a division of Vicon’s parent company, Oxford Metrics, rather than Vicon itself, serves game developers, filmmakers, ad agencies, and, these days, iPad developers, but the bulk of its motion-capture work is for games and films. “We do our fair share of film work throughout the year, but we do a tremendous amount of game work,” says Brian Rausch, vice president of production. “Games are entirely CG, and developers can’t have poor-quality data. A lot of game companies have internal groups, and we work with them, but we also see aggressive developers that don’t want to grow abnormally large teams coming to us.”

Rausch is particularly excited about the ways in which House of Moves is facilitating virtual production for game cinematics. This is similar to virtual production for film—performance capture applied in real time to CG characters composited into environments viewed through a camera—but better. In the past, the studio has used Autodesk’s MotionBuilder to manage the application of motion data in real time to the CG characters, and still does for film production. But that’s changing for game cinematics.

“You have to constrain the environments in MotionBuilder to use them for real-time animation,” Rausch says. “When you’re working with film production, you can’t have final environments; you need approximations, but on the game side, we removed that step. We’ve written a bunch of streaming protocols to send motion-capture data straight into the Unreal game engine. This means game directors can guide motion-capture actors and see the final characters in their final environments. They can look through the camera into their final world.”

Because the environment is no longer a proxy, the directors know during the capture session whether everything lines up; they don’t need to make a leap of faith.
“We still have a final refinement phase where we push hand poses, postures, and timing,” Rausch says. “And, we still use MotionBuilder to animate. We would never dream of cutting animators out of the process. But, we’re giving directors the ability to know they got the shot and that they’re looking at it in the final resting place. They’re no longer working in the darkness.”

Motion capture at the House of Moves is primarily optical, using Vicon hardware and Vicon’s Blade system. “We have 300 Vicon T160s,” Rausch says. “We’re not looking for other ways to capture. Marker data is faster, cheaper, easier.”

In fact, it’s about to become even faster when Vicon rolls out its next version of its software.

“We’re at the beginning of something new,” says Phil Elderfield, entertainment product manager at Vicon. “We’re working on a product called Axiom, which is not a broadly announced name yet. Everything having to do with capturing something is completely new. The first evidence of what Axiom is capable of will be a new real-time system robust enough to allow continuous, uninterrupted shooting, so people can focus on the shoot, not the technology. We want the performers to be able to see themselves in real time, respond to their performance and adapt, and have that momentum continue without having to stop to fix something. Moving forward, we plan better integration of reference cameras and other capture devices, like gloves and helmets, in sync. The platform we’re developing will allow for that.”

In addition to drawing from Vicon’s enter­tainment division, House of Moves has brought in equipment from the life-science side: an EOG (electro-oculography) measurement system with Vicon’s Nexis medical application. The EOG tracks eye motion by capturing the signal sent from someone’s brain to the eye muscle, and then converts that analog wave into digital data that can rotate digital eyeballs. This is particularly interesting to game developers.

“We’re doing full-blown eyeball capture,” Rausch says. “I’d say that 95 percent of the capture at House of Moves is full-performance capture, and of that, 80 percent are using the EOG system. The detail in the eye that it provides brings characters to life.”

Testing DigiDouble Shells
Some of House of Moves’ most interesting film work has centered on digital doubles, with doppelgangers for  Journey 2: The Mysterious Island actors Dwayne Johnson, Michael Caine, Josh Hutcherson, Vanessa Hudgens, and Luis Guzman setting the stage. First, Icon Imaging Studio’s scanners captured the costumed actors’ topology, and the USC Institute for Creative Technologies’ Light Stage captured textures and lighting. A custom process lined up the 3D scans with the geometric lighting and texture information using reference points. In other words, they created a photorealistic shell—a surface representation of each actor in costume. Then, the actors walked across the street to House of Moves for full-body motion capture. “[The production team] wanted to be sure the range of motion worked,” Rausch says.

House of Moves artists applied the motion to a skeleton they placed inside the shell. “We tested the shell,” Rausch says. “The groundbreaking part is that there were no revisions.” Journey 2 visual effects supervisor Boyd Shermis gave the resulting dataset with high-resolution geometry, texturing, lighting, and a skeleton with a fundamental range of motion to all the visual effects studios working on the project to use in their shots and to animate as needed.

Giant Steps for Film Production
Consistently at the center of some of the most innovative uses of motion capture during film production has been one facility: Giant Studios. Giant Studios’ systems helped capture actors’ performances for Steven Spielberg’s Tintin, George Miller’s Happy Feet 2, Shawn Levy’s Real Steel, James Cameron’s Avatar—all told, 31 films, with more under way.

“One of the things we’ve been doing recently that’s above and beyond the more traditional capture work is camera tracking,” says Matt Madden, vice president for production and development. “We did a good bit of that on set using coders for Jack the Giant Killer [recently bumped to March 2013]. “A lot of our work is crossing into virtual production.”

Madden describes four categories of motion capture—and camera tracking—for film production. The first three use optical capture with markers, with the first and most traditional, creating CG characters in CG scenes shot with a virtual camera. “Motion capture is still a cost-effective way to shoot those films,” he says. “You have all the elements for the production.”

Second is pre-capture—that is, motion captured before live-action photography. The process sits between previs and production. Conceptually, it’s an extension of previs but with final capture data. Madden calls it shot planning.

“The idea is to save money before you go on set,” Madden says. “We’re combining the real-time traditional capture, which is still important to the process, with real-time graphics, elements of previs, and the virtual camera, and getting shots to editorial. Although that’s like previs, the difference is that you can use the motion assets all the way through production into postproduction. It’s popular because we can get feedback from production for performances and timing before filming.”

Digital Domain’s Visual Effects Supervisor Erik Nash relied on this pre-production process for Real Steel and received an Oscar nomination for visual effects. “It is especially useful when it isn’t practical to have the performers on set,” Madden says, “if, for example, the characters need to be 25 feet tall. The value of the pre-capture is that the director shoots the motion-capture performances as if they were live-action. We can have doubles on the motion-capture stage standing in for the actors for blocking and timing.”

Digital Domain, which has used motion capture to animate giant robots and perform digital faces,
has opened a virtual production facility in the Los Angeles area, where researchers are
concentrating particularly on facial capture.

The director can pick the performances he or she likes and can film those performances with a virtual camera; the editor has footage to cut into sequences, even though the live-action elements are proxies.

Data from these capture session moves in two directions. First, Giant applies the motion data to low-res characters in MotionBuilder for use during “simulcam” filming so the director and camera operator can see the CG characters performing within the live-action scene. “You have the performance selects made and cued for playback,” Madden says. “So, you just need to track the live-action camera and comp in the actors and backgrounds.”

Then, the studio gives a clean set of motion data from the capture—altered as needed based on any changes requested during filming—to the visual effects studios that will produce the final shots. During this process, Giant often works with the visual effects houses that will create the final shots. “We want to be sure we have the same infrastructure,” Madden says. “This is not the part that gets the headlines, but it’s the part that makes it all work. We’re far, far away from creating generic data. We are getting shot-specific in terms of what performances we capture.”

The third category is what Madden calls “simulcam,” which is recording actors’ performances on the day of the shoot—capturing and compositing in real time. This is the well-described process used for Avatar and Tintin.
And fourth is image-based capture integrated within live-action photography, as ILM did
using its iMocap system first for Pirates of the Caribbean, and as Giant did for Narnia. “We put people in suits with colors that depend on the set lighting, and track the color patterns,” Madden explains. “The main application for this is when you have characters close to human scale. We don’t have an elaborate setup; we often don’t know where the coverage needs to be before they start rolling. We have a crew that records the movement during live-action photography with HD cameras, digitizes it, reconstructs it in 3D, and applies the motion to the character.”

Mobile Mocap
If it seems that much of Giant’s work is happening outside the studio, you’d be right. “Game production is still very much a part of the traditional capture volume, and they’re bringing more to the stage in terms of their CG environment,” Madden says. “The capture stage is their live set; there’s no first unit. But, in the film world, people are requesting that we set up volumes on set so they don’t have to plan for another schedule in post.”

In the past, motion capture for films typically happened in Giant’s Los Angeles studio. But now, with many of the production companies in Sydney, Vancouver, Toronto, the UK, and other places, there’s no reason to go to LA other than for a capture session.
“On-set development is being pushed hard by feature-film production companies,” Madden says. “There’s a strong push to keep our team and our footprint as small as possible, and they assume we’ll stay on location with them because they can’t say when a certain setup will be required. So, we build hybrid crews leveraging local talent in all these cities that can shift into a post mode when they’re not shooting.”

As for games, which still comprise much of Giant’s business, Madden says the requests are for simultaneous face and body capture on stage. “The biggest area of growth for our business is using head-mounted cameras for facial capture,” he says. And, virtual production. They want to see face and body performances in real time with no post-capture layering.

“Game developers want to see everything in frame and in real time on the capture stage,” Madden says, noting that Giant recently moved into a new facility in Manhattan Beach, California, next to James Cameron’s new Lightstorm Entertainment studio. “Film and games are pushing in different areas. The virtual production for games will benefit film down the line.”

PhaseSpace’s active LED systems now work with easier-to-wear, thumb-sized controllers and
can include accelerometers.

Let There Be Blood
Working with Giant on Jack the Giant Killer was a Digital Domain team led by Gary Roberts, who had been at House of Moves and then at ImageMovers Digital (IMD) before setting up Digital Domain’s new virtual production facility. “Digital Domain acquired the building in Playa del Rey, California, from Disney when they shut down IMD,” Roberts says. “I had architected the stages, and Digital Domain was interested in virtual production, so we opened the studio.”

The biggest stage is 100 by 90 feet with a capture volume of 65 by 35 by 18 feet; a second stage is half the size. For body capture, the facility uses 122 Vicon cameras. But, some of the most exciting motion-capture work is happening on a dedicated face survey stage. “Body capture is well known whether you’re working in a controlled environment on stage or an uncontrolled environment shooting alongside traditional photography,” Roberts says. “Our effort in that area is getting a good digital version of an actor to retarget to the character. But, a lot of our effort is going into faces, into acquiring as much information as we can about how an actor’s face moves, and being able to re-create that. We’ve taken multiple steps beyond Tron.”

Digital Domain has installed 19 high-quality, high-frame-rate, machine-vision video cameras on the facial survey stage to capture seated facial performance. “We build a statistical model of how an actor’s face moves to get an understanding of the muscles within the face, how the skin reacts, all the way to blood flow. We capture that information to build a model of the actor’s face to re-create digitally and to derive a character.”

The process is all computer-vision based; the person being “surveyed” doesn’t wear markers. The studio looked at recent SIGGRAPH papers and developed algorithms internally to do the analysis. “Rather than painting 300 tracking markers on a face, we look for natural features,” Roberts says. “We can generate between 5,000 and 7,000 individual points on a face from a face survey, including lips and eyes, that we reconstruct in 3D, and use that to track to a known mesh topology across time. Essentially, it’s motion capture but at a much higher resolution. The goal is to understand how the face moves and generate a human or a stylized character based on that performance.”

The idea is similar in concept to the Contour system from Mova, which many studios use, particularly to capture FACS poses based on facial expressions and phonemes for modeling blendshapes used in facial animation. But, Digital Domain is trying to take the idea further.

 “When an actor hits a pose, you can visually see blood flowing into the face,” Roberts says. “We get the surface reconstruction of an actor’s face. And, we get a static scan of the face down to the pore-level detail and beyond from ICT [USC’s Institute for Creative Technology] to create textures and shading. We’re getting blood flow and eyeball deformation, viscosity, moisture around the eyes and lips. We can start picking through features and create algorithms to drive flood flow across emotional shapes. The model can be driven by an animator or from the face solve.”

In addition to the facial survey stage, for mobile capture, Digital Domain uses face-mounted cameras to capture the surface of an actor’s face, even the tongue inside the mouth.  “Our secret sauce, the biggest innovation and the latest technology, is our software. It’s taking the video images we’re using of actors’ faces, bodies, and fingers, and reconstructing dense surface information.”

During production on Jack the Giant Killer, Giant would place video images from Digital Domain’s facial-capture cameras onto the digital character’s face to give the director and editor an idea of what the motion will look like and what the character is feeling. They call it a “Kabuki mask.”

“Everything we try to do is pushing in the direction of real time for on-set visualization,” Roberts says. “We’re working on the next-generation hardware for facial capture to have that be real time, as well. It’s definitely a big push in our world and within Digital Domain. It’s all software, but it takes an understanding of how a body moves, how a face moves, and getting that into an animator’s hand quickly.”

Real-time Mocap for Broadcast, Baseball
Although game developers and filmmakers use Motion Analysis systems—notably Weta for capturing actors playing chimps for Planet of the Apes—that company has found a niche in the broadcast markets. “We take what would be locked-off cameras on greenscreen stages, and instead have multiple cameras that see the set from multiple points of view and activate graphics using camera tracking during live productions,” Wright says.

CNBC, for example, uses the system for news shows. “It keeps the on-air talent facing the camera rather than turning sideways and pointing to a screen,” Wright explains. “We put markers on the back of his hand, on the camera, and on props. Then, we can track and overlay graphics.”

With this system, when a newscaster points to a transparent box, the first bar in a bar graph might appear, and then another, and another.  Sometimes the bars grow out of a table. The charts and graphs float—or grow—between the newscaster and the viewer.

“That’s been a big thing for us during the past few years,” Wright says, “streamlining this and putting it into one easy pipeline. But we’ve also developed cameras that can go outdoors in full sunlight, which Weta used with their LED markers for Apes. And, our cameras have been used to capture spring training for a baseball team. We had captured them for years for biomechanics, but that was indoors and they don’t want to play indoors. It doesn’t feel real. So this year and last year, we were able to move outdoors. Putting the cameras into natural environments is a big change.”

Motion Analysis has also worked with the Jim Henson Company on several children’s television series. “We’ve been able to go from shoot to air in six weeks with virtual environments and virtual characters,” Wright says. “That’s a huge push for us.”

Wright believes the next big leap will be markerless solutions that can stay on par with traditional optical markers. “But right now, data coming out of inexpensive systems needs a lot of cleanup; it doesn’t hold together when you need  an accurate, real-time feed,” he says.

Reallusion’s iClone 5 software, working with Microsoft’s Kinect cameras, provides a
low-cost method for quickly capturing motion using Xboxes or Windows-based computers.

PhaseSpace, which brought motion capture using active LED markers to the market, continues to upgrade its systems, offering invisible LEDs, controllers the size of a thumb, and accelerometers in its low-cost systems. “We’ve spent millions of dollars to make our system easy to use,” says Tracy McSheery, president. “It doesn’t take extra manpower. You don’t need motion-capture specialists. An artist suits up and walks away with the data that’s already 90 percent cleaned up.”

McSheery offers an anecdote to prove his point. “We had a call from Ubisoft in Montpelier, France, which wanted to have an in-house system rather than using the big system in Montreal and getting data sometime later,” he says. “One of the executives was upset about buying a second system. But, after our demo, he became one of our ardent supporters.”

Much of the company’s entertainment business is overseas, with 18 systems recently installed in China and in this country within research labs. “Most of our customers are people no one in Hollywood has ever heard of,” McSheery says. “But, the irony is that they’re less tolerant about data cleanup.” PhaseSpace’s least-expensive, full-body system can capture one person with eight cameras running at 120 hz. It includes a Hewlett-Packard Z210 workstation and sells for approximately $20,000.

Also offering lower-cost systems is OptiTrack (previously NaturalPoint), the first company to offer a USB motion-capture camera with a kit for $5,000—at SIGGRAPH in San Diego. “It was such an aggressive price, people didn’t believe it worked,” says Jim Richardson, president and CTO. “Plus, we were selling directly from the Web and posted our prices online. We now have around 1,000 OptiTrack installations. Game developers, hobbyists…we’re selling all over the map. Small developers are using it to do previs—it wasn’t affordable before. And people are buying multiple systems.”

Recently, the company introduced a 1.3-megapixel, 120-fps camera for $1,000. Richardson estimates that users could set up a 15 x15-foot capture volume using 16 Flex 13 cameras for approximately $20,000.

For its part, Xsens’ MVN inertial motion-capture systems are entirely accelerometer-based. With the gizmos easily donned, animators in various studios are now using the systems for tests and reference.
Animators at Halon used the system, and those at Double Negative created reference moves for John Carter. ILM’s Colin Benoit tried camera moves using an Xsens motion tracker to orient the camera for Rango. Animators in the previs department at 20th Century Fox are now using the MVN system as well to sketch out elaborate fighting moves for such films as X-Men: First Class more quickly than possible with keyframe animation. Further speeding production is the company’s plan to move data straight from Xsens to Autodesk Maya without traveling through MotionBuilder on the way. Customers should expect pricing in the range of systems from OptiTrack and Phase­Space.


At Digital Domain, though, Roberts has begun experimenting with an even lower cost method of capturing motion for previs—using Kinects. “We have a couple of Kinect cameras in our previs department, and they’re great,” he says. “It’s simple and crude, but for previs, it’s great. You can stand up, turn it on, get a quick calibration, and animate a character quickly.”

Digital Domain has created proprietary software to run with the Kinects, but those without an R&D department at hand might try Reallusion’s iClone 5 software. “We started working with Microsoft in February 2011, when they introduced the research SDK for Kinect,” says John Martin, vice president of product marketing. Now, we have the capability through Kinect for Windows, and the camera has higher resolution. You can hook up Kinect through your USB, stand in front of a sensor, puppet a character, and in real time you’ll see the character performing the way you move.” The company’s software provides tools for quickly building animated characters that can be puppeteered with Kinect. Motion data can be output as FBX files, and game developers can use HumanIK, the middleware from Autodesk, to edit the motion files.

Also offering marker-free motion capture at low cost is iPi Soft, which introduced iPi Motion Capture first in 2009 and recently upgraded. The software uses image-processing and computer-vision algorithms to track multiple people even doing 360-degree turns, the company claims, using one or two Kinect input devices, three to six Sony PlayStation Eye cameras, or Webcams. Motion data can be exported as FBX, BVH, and Collada formats, and thus used in a variety of game engines and 3D software packages.

“[Kinect] is amazing,” Wright says. “At the point when you can put 40 Kinects in a room and have the quality we get with 40 of our systems, we’ll see it permeating high-end productions. We’re on track for that.”

But, not this year—or even next year. Maybe in the next decade, according to some.

“I think the emphasis in motion capture has shifted the right way, to the creative goal,” Elderf­ield says, but cautions, “I think the expectation also is that as an industry, we’re already there. We all understand that this is where we need to be focused, providing creative decision  makers with the tools they need. But, there is still work to be done.”  

Barbara Robertson is an award-winning writer and a contributing editor for Computer Graphics World. She can be reached at