Issue: Volume: 24 Issue: 2 (Feb 2001)

Avatar Advances


The most critical component to a successful virtual reality application is the ability to make users believe they are actually in the digital environment. A number of factors contribute to this sense of immersion, including real-time interaction and believable (though not necessarily photorealistic) 3D graphics.

A new system under development at the University of North Carolina (UNC) promises to move virtual worlds one step closer to this ideal by enabling real-time 3D reconstructions of the user and other real objects in an immersive virtual environment. When the user steps into or places a body part such as an arm into the virtual field of view, the image-based system captures the body part and builds a 3D graphical representation of it, which the user sees in the virtual world in the same way it would appear in the real world. In addition, the user can interact with other real objects that are also contained within the system's field of view. For example, if the user reaches an arm into the viewing space to grab a physical object off of a desk, he or she will see an accurately lit, pigmented, and clothed graphical representation of the arm and a similarly accurate representation of the object.

To achieve this, the system combines multiple camera views of a real scene taken from various perspectives and analyzes the collected data to extrude object forms. It then renders the defined objects in real time using graphics hardware.

To collect the image information, the novel system builds a reconstruction volume based on the combined projection boundaries of the cameras. For example, the researchers have implemented the technique using a six camera setup, whereby five of the cameras are wall-mounted and the sixth is attached to a headmounted display worn by the user. With this configuration, the system reconstructs objects within an 8- by 6- by 6-foot volume.
To "virtually" represent real-world objects and people within a given viewing area, the UNC system projects volumes collected from multiple cameras (shown as spheres) onto a plane. The 3D intersection of these volumes (right) creates the reconstructed ima

What makes the system unique is its reliance on camera images rather than explicit 3D models. Also, it does not use tracking sensors to model user interaction (optical tracking is used to enable navigation through the scene), nor does it require prior object knowledge. Instead, it interprets the collected image data using a "visual-hull" technique.

A visual hull is a geometric shape obtained by integrating silhouettes of an object as seen from a number of views, extruding volumes that contain the objects from each silhouette, and representing the intersection of these volumes as an image in 3D space.

At the heart of the new technique is a multi-step 3D reconstruction algorithm. Step one involves defining object pixels within each photographic volume. Next, the volume intersection is calculated, the visual hull is rendered, and the rendered hull is composited with the virtual environment.

In the first step, the fixed cameras are positioned so their view completely contains the volume under consideration. The system then extracts pixels of interest using a subtraction technique. Initially, each camera captures reference images of the empty scene. For subsequent frames, the reference data is subtracted from the current image. The resulting image pixels are segregated, so that pixels under a certain pre-defined threshold value are labeled background pixels and the remainder are labeled object pixels.

Next, the visual hull is generated by computing the 3D intersection of these object-pixel volumes from each camera. The system is specifically designed to examine only parts of the volume that could contribute to the final image. "Instead of trying to compute entire volumes or the intersections of potentially complex polygons that represent the object-pixel projection volumes," says principle researcher Benjamin Lok, "we ask, 'which points in the view volume are inside the visual hull?'"
The synthetic data set from a reconstructed 3D model of a user is rendered in five positions. Such information will ultimately be used to generate arbitrary views of users in real time for avatars and also for enhanced interaction within a virtual environ

To answer this question, intersection tests are run to determine if the 3D point in the volume projects onto an object pixel in all of the cameras' images. If so, the point is within the final projection volume. Lok notes that this analysis task would stress the computational and bandwidth capabilities of most machines. Consequently, he says, "we use the massively parallel computational power of graphics hardware to compute a view-specific, pixel-accurate volume intersection in real time." Currently, the six camera inputs are connected to a 32-processor SGI Reality Monster system.

This power is also critical for the next step, image rendering, in which projected textures are used to color all of the elements in the final image.

The new system holds the potential to revolutionize interaction within a virtual world. Currently, Lok notes, "interacting with virtual environments forces a mapping of virtual actions to real hardware, such as joysticks or mice." This can be difficult for some interactions. In another UNC virtual environment project, for example, magnetic tracking is used to model the user's interaction with objects in the virtual scene. "The user is instructed to pick up a virtual book from a chair and move it around the virtual environment. The user carries a magnetically tracked joystick and must make the avatar model intersect the book to select it. The user then presses and holds the trigger to pick up and carry the book." In contrast, real-time models of the user enable more natural interactions with elements in virtual environments. "[In the new system], we replace the synthetic book with a real book that would be reconstructed along with the user. The user could then see the book on the chair and naturally reach out to pick it up."

To minimize any visual discrepancy between real and synthetic objects, lighting and shading considerations are also part of the new system. "We want the dynamic models of real objects to be lit by virtual lights," says Lok. "To do this, we [compute] the virtual environment's lights while rendering [the reconstructed image]." In addition, real lighting can affect synthetic objects. "We can use traditional shadowing algorithms to cause synthetic elements in the environment to cast shadows on the real objects."

It's important to note that the new technique does more than overlay dynamic models of the user and nearby objects onto the computer-generated scene. These models can be active elements in the simulations. For example, says Lok, "we have implemented a particle system that represents water flow from a faucet in which a reconstruction of the user's hand and a new object-a plate-was used as a surface within the particle system. The water particles interacted with the plate and flowed into the sink."
A reconstruction of a user's hand and a plate become a surface within a particle system that has been created to illustrate the flow of water in a sink. The UNC researchers hope to extend this type of interactivity to non-planar data as well by devel

In addition to sampling the visual hull in such a way (using planes), the researchers are developing techniques for collision-detection of multiplanar objects. In the above example, the user's hand and arm would be a three-dimensional volume within the particle system rather than a surface.

Accelerated graphics capabilities are key to the success of the technique. "By using the frame buffer to compute results in a massively parallel manner, the system can generate reconstructions of real scene objects from arbitrary views in real time," says Lok. As the capabilities and performance of graphics hardware continue to improve, he says, "our system will increase in accuracy, improve reconstruction quality, and improve the interactions between real and synthetic objects."

In addition to enhancing the system's collision-detection capabilities, the re searchers are also focusing on developing improved rendering techniques and exploring the range of applications to which the technology might be suited.

"Our system allows virtual environments to be truly dynamic and enables completely new ways of user interaction," says Lok, who anticipates evaluating the system for its effectiveness in interacting with virtual characters, new navigation methods, and its effect on presence in virtual reality.

Diana Phillips Mahoney is chief technology editor of Computer Graphics World.