Universal Scene Description
BY F. SEBASTIAN GRASSIA AND GEORGE ELKOURA
Issue: Edition 2 2020

Universal Scene Description

For as long as we have been creating digitally synthesized images, we have needed ways to describe the 3D scenes we are synthesizing in a way that is mathematical enough for computers to understand, but which is also understandable and manipulatable by technologists and artists.

This article tells the story of some of the challenges Pixar faced in describing, sharing, and transporting 3D scenes as our pipeline evolved over 25 years, from creating the first feature-length computer animated movie, to making it possible for our productions' artistic visions to keep growing in richness and collaboration. Thus, we present Universal Scene Description (USD), Pixar's open-source software for describing, composing, interchanging, and interacting with incredibly complex 3D scenes.

USD is a cornerstone of Pixar's pipeline and has been on a rapid and broad adoption trajectory, not only in the VFX industry, but also for consumer/Web content and game development.

BEFORE THERE COULD BE TOYS

When Pixar set out to make Toy Story in the early '90s, we had industry-leading, commercially-available products at the front and back of our 3D pipeline - Alias' PowerAnimator at the front for modeling, and Pixar's own RenderMan at the back. But to handle the scale of making a feature-length animation, we needed to invent or improve a suite of custom tools and data formats for everything that needed to happen between modeling and rendering.

"Scale" here impacts several aspects of production that weren't quite as daunting when making Pixar's earlier projects (short commercials and effects): number of artists who needed to be simultaneously working on the same project, complexity of the environments, and number of acting characters, number of assets, sequences, and shots.

We had already figured out that one important component to successfully working at feature-scale was to separate out different kinds of data into different files/assets. While "appropriate format" was a consideration, more important was that by separating out geometry, shading, and animation into different files, we enabled artists of various disciplines to work independently and safely, simultaneously, and made it easier for them to reuse components.

With these and other organic improvements, Toy Story was possible, but also illuminated many ways in which our pipeline needed to become more scalable and more artist-friendly.

USD
While making Toy Story, an issue of scale required custom toolsets.

THE INDUSTRY EVOLVES &
PIPELINES GET MORE COMPLEX

As more studios began making feature-length CG animation, more varied and powerful software became commercially available, which was terrific because it extended artistic reach, but it also made pipelines more complicated. The three problems of scale only grew (and still do!), but now the industry faced an additional problem of data interchange, since it was rare for any one vendor's product to read data in its full richness from another vendor's - sometimes not even products within a vendor family could speak to each other.

Many VFX studios, including Pixar, built pipelines around in-house software, but with entry and exit points for data from many vendors. Formats for 3D interchange were available, both open and proprietary, such as obj, Collada, and FBX, but none could satisfy the goal of rich (in terms of schema and kinds of data), universal interchange.

Studios developed different strategies to deal with interchange; Pixar built a "many to many" conversion system called Mojito that defined a core set of data schemas into which you could (in theory) plug any DCC in as an input and get out a data file for any other DCC (or renderer, though we only ever implemented support for RenderMan). Mojito allowed us to adapt to new software in a reasonably modular way, but it was expensive to maintain and did not help with any of our scale problems.

Finally, in 2010, the industry received a quantum leap forward on the interchange problem, when Sony Pictures and Industrial Light & Magic released its open-source collaboration, Alembic (see "Share and Share Alike," CGW, Summer 2019 issue). Alembic was designed with rigor and deep knowledge of VFX pipelines to address the interchange of "baked" (or time-sampled) geometric data, with vetted schemas for geometric primitives and transformations, and a data model that abstracted "file format."

That abstraction was critical because it allowed Alembic to gradually deploy formats like Ogawa that addressed some of the important scalability issues for VFX, such as being able to store massive amounts of animated data in a file, while primarily paying the cost (IO and network traffic) for only the pieces of data a particular application/node needed for its task. Alembic arrived (not coincidentally) at a time when many studios were lighting in different applications than they were rigging and animating, so its "bake out the rigged animation into time-sampled mesh data" approach was exactly what was needed.

To keep the scope and mission of the project tight and manageable, Alembic stayed focused on interchange of "flat" data, free of any concerns of composing multiple Alembic archives together, proceduralism, execution, rigging, or even a run-time scene graph on which such behaviors could be layered. Within several years, Alembic had deeply penetrated the VFX industry, and the majority of commercial 3D vendors support Alembic import and/or export.

Pixar did not adopt Alembic in its pipeline, primarily because from 2008-2010, we had been developing our own time-sampled interchange format, called TidScene, with a slightly different approach to the problem… but we're getting a little ahead of ourselves.

REFINING WORKFLOW AND 'COMPOSITION'

At around the time Ratatouille entered development, Pixar was growing hungry for revamping its toolset, which was beginning to show its age; not only was our animation and lighting system nearly impossible to adapt to the multi-threaded future of performance, but our pipeline had also grown complicated. Any given shot relied on dozens of different file formats and data abstractions: We had one format for "primary" geometry, but another for UV sets and other primitive variables that were too big for the primary format, and a third for simulation-generated geometry overrides. We had one format and language for shaders, another for shading networks, and yet a third to describe how materials were bound to geometry. It was time to reconsider our tools, workflows, and pipeline.

The Menv30 project was kicked off, later renamed Presto when released. Menv30 was an opportunity to take all the learnings of the earlier toolset and develop them into a cohesive and well-structured set of libraries. Perhaps most importantly, the project sought to represent all of our workflows using a single file format, menva, in a single application that could easily allow artists to transition back and forth through a fluid and malleable pipeline. We started by targeting rigging, animation, and layout, later adding simulation workflows.

To Presto's scene graph library, we added and refined all the features that were necessary to bring together all the disciplines and workflows that we had so far accumulated. The most fundamental feature was a layering system that is often compared to Photoshop's image layering paradigm. Referencing, inheritance, and variants completed the initial set of composition operators that formed a robust composition engine, and which, unbeknownst to us at the time, began the underpinnings of the technology that would later become USD.

We had initially targeted Toy Story 3 (and missed it) for early adoption of this brand-new ambitious system and knew it was going to be difficult to achieve. In order to make it more tenable, we stopped short of adding support for the back end of the pipeline: shading, lighting, rendering, and so on. That was going to be left to the existing systems. That meant: interchange.

We now needed a way to translate data born in Presto to the existing back-end systems. In order to do that, we invented a new scene transportation system called TidScene. TidScene was built using Berkeley DB and was primarily targeted toward efficient transport of time-sampled binary data and rapid preview imaging. As we began deploying TidScene in more parts of the pipeline, we kept encountering the need to add more and more features to it, such as referencing and inheritance.

Having TidScene meant that we were implementing things twice, as we kept wanting to add more features that we had already implemented in Presto. Worse, the implementations grew inconsistent and incompatible with one another.

This became a significant enough problem that a solution was indeed needed.

The solution? USD.

WHAT IS USD?

USD began as an experiment in 2012 to see if we could take the low-level data model and "composition engine" at the heart of Presto, where it served a rich scene graph and anim curve-driven execution system, and build atop it instead a light, highly scalable scene graph that practiced lazy data access, embraced time-sampled data, and facilitated the creation of "schemas" (think Mesh, Xform, Material, etc.) to foster robust scene interchange.

Our proof-of-concept came very close to the performance of TidScene on equivalent large scenes, while opening the door to the full suite of Presto's composition operators (though we chose to leave out a few of USD for performance and/or complexity reasons). However, we believed we needed to be much more than an order of magnitude better than TidScene to satisfy the growing needs of the studio for the next 10-plus years.

We made good progress toward that goal by making data structures thread-safe/efficient and algorithms multi-threaded at all levels of the USD software stack. The development of "native scene graph instancing" to augment schema-based point/array instancing allowed us to load scenes containing millions of primitives in seconds on modern workstations.

Performance has consistently been a top priority for USD, and we continue to make small and large improvements. But while performance is a key "wow factor" for USD that facilitates artists' ability to iterate, there are several other important reasons why USD has been so quickly and broadly embraced.

COMPOSITION ALGEBRA

If there is any magic in USD, it is the "composition engine" that provides the ability to combine "layers" worth of scene description in many useful ways, and lets users make non-destructive edits. The rules by which this powerful combinatorial engine works have been refined over several different incarnations, starting with work pioneered on A Bug's Life, and protected by more than a dozen patents that Pixar released for public use as part of open-sourcing USD.

Understanding the full implications of the composition rules can be daunting, but simple uses are easy to understand, and the complete behavior can be described as the recursive LIVRPS (pronounced "liver peas") algorithm, documented on openusd.org. We refer you to the website for deeper exploration of USD's composition arcs, but we will describe several of them here as they pertain to addressing the scalability problems we discovered way back on Toy Story. Note that each of the composition arcs can be used to solve other problems as well, and one of the things in which we have invested heavily is trying to make sure that any possible combination of composition arcs behaves in reasonable ways.

Layering for Multi-Artist Collaboration - Layering in USD is similar to layers in Photoshop, except that in USD each layer is an independent asset, and often, each layer will be "owned" by a different artist or department in a pipeline. While the modes by which data in different layers can be merged together is much more restricted in USD than it is in Photoshop, layer compositing allows multiple artists to work simultaneously in the same scene without fear of destroying work that another artist is doing.

One artist's work may "override" another's because their layer has higher precedence than the other artist's, but since layer precedence in a sequence or shot in a CG pipeline often corresponds to stages of the pipeline, this is not often a problem. When it is, the fact that each artist works non-destructively with respect to other artists' work means we can deploy a range of tools to handle situations in which something unexpected happens.

Referencing and Variant Sets for Asset Complexity - Referencing in USD is not dissimilar to referencing features in packages like Autodesk's Maya, but more powerful in that careful consideration has gone into specifying how multiple references can combine as we build up assets, aggregate assets, sequences, and shots via chained and nested application of referencing.

References allow us to build simple or complex "component" assets out of modular parts. References also allow us to build up environments out of many references to individual component assets - some of the costs will be shared at run time among many references to the same asset, and when we enable those references to be instanced, that sharing increases substantially more.

References allow us to bring environments into sequences and shots. At all "levels of referencing," USD allows you to express overrides on the referenced scenes in a uniform way, so once you learn the basics of "reference and override," you can use it to solve many problems.

One significant source of pipeline complexity is the number of "variations" of published assets that are typically required to provide the level of visual richness that audiences expect from modern CG films. USD provides variant sets as a composition arc that allows multiple different versions of an asset to be collected and packaged up together, providing an easily discoverable "switch" for users of the asset to select which variation they want. This switch (called a Variant Selection) is available in the USD scene regardless of how "deeply" (via referencing and layering) it was defined in the scene's composition structure.

One of the most powerful aspects of Variant Sets is that they can vary anything about an asset, and an asset can have multiple of them, nested or serial. In Pixar's pipeline, it is common for assets to have a modeling variant set, one or more shading variant sets, and rigging and workflow LOD variant sets.

And More- USD also provides composition arcs for broadcasting edits from one prim to many others (Inherits); special references called Payloads for deferring the loading of parts of a scene, to allow users to craft manageable working sets; scene-graph instancing to not only control the weight of the scene graph, but be able to reliably inform renderers and other consumers what things in the scene can be processed identically; and many other features to facilitate non-destructive editing of large scenes.

HYDRA IMAGING ARCHITECTURE

One of the key tools that allowed TidScene to succeed as a project was an inspection tool called tdsview. The tool allowed users to inspect the contents of a TidScene file visually in 3D and gave a very fast way to debug shots. As we were going to replace TidScene with USD, we needed to provide a corresponding tool, with perhaps loftier ambitions.

The heart of such a tool is a fast 3D viewport, and we wanted to build as good a viewport as we possibly could. Also by then, some of us were beginning to itch to replace Presto's imaging technology. We saw a wonderful opportunity to build a generic imaging architecture that we could plug in to the various proprietary tools that needed a 3D viewport. So, Hydra was born: a multi-headed beast, each head representing any one of our scene formats and associated tools.

Hydra was originally a state-of-the-art OpenGL-based render engine capable of displaying feature-film scale assets in as close to real time as we could get. Supporting the idea that Hydra should give high-fidelity feedback to artists, we looked to RenderMan, our production final-frame renderer, for the ground truth of how certain scene geometries and parameters ought to be interpreted. Along with the ability to embed Hydra into as many applications as needed, we began to fundamentally expand on the goals that OpenSubdiv pioneered: high fidelity real-time feedback that looked consistent everywhere.

Today, Hydra has grown into a much richer architecture that supports not only multiple "heads" (scenes and applications), but also multiple "tails" (renderers). We factored out the OpenGL renderer into its own Hydra back end that we now call Storm (though many folks still say "Hydra" when they mean to say "Storm" - and we've had to come to terms with that, well, we've mostly come to terms with that).

We've also implemented a RenderMan back end for Hydra. We had already spent an enormous effort integrating Hydra into our various applications, like Presto, and now almost for free, we are able to see the scene as rendered by RenderMan. This has the promise to be transformational for our users, but we'd be remiss if we suggested that we're already able to fully benefit from this technology. Truthfully, we're not yet able to take full advantage of its full potential for several reasons.

Chief among them is that some data required for high-fidelity rendering isn't available in our Presto scenes early enough in production. Similarly, our scenes are transformed significantly before they hit the renderer for final-frame production. We know we're not done yet, and we're excited by the future potential workflows we'll be able to achieve.

PRACTITIONER TOOLSET

USD was built by practitioners, for practitioners, and one of the ways in which that matters is the investment in tools and features whose goal is to make working with USD data easier and transparent. USD contains, out of the box, a suite of tools for working with USD and the various external formats that may feed USD, including:

  • Binary and ASCII encodings.
  • Fast, powerful preview and debugging tools.
  • Tools for creating more USD.

OPEN SOURCE, OPEN ARCHITECTURE

Pixar had experimented with open source through its first open-source project: OpenSubdiv. With OpenSubdiv, we wanted to make sure that our vendors were all using the same subdivision algorithms that RenderMan was, so that artists saw consistent and reliable results across the pipeline. We were starting to see the value in changing from proprietary, closed software, to embracing open source as a way to strengthen our technology and increase its longevity by turning critical parts of our toolset into shareable, openly available libraries that we hoped would become industry standards.

With USD, we wanted to bring that same consistency and reliability to the composition of the whole scene. However, unlike OpenSubdiv, USD rests upon the lowest levels of Pixar's 3D software foundations, but the studio strongly supported the project by allowing us to incorporate Pixar's proprietary software stack into (what we hoped for originally, and then committed to in 2014) an open-source project.

For USD to succeed in its "universal scene interchange" goals, it is important that 3D vendors adopt support for it natively, and it was clear that a well-supported open-source project was the only viable way to foster that kind of adoption. Additionally, as we will discuss more below, sponsoring an open-source project that literally targets entire pipelines has been a fantastic opportunity for productive and enjoyable collaboration with many studios and vendors in the industry.

Because USD needs to plug in to most applications in any given pipeline, a critical factor for its success is the degree to which it can be customized, without impacting the "common definitions" that enable interchange. Out of the box, USD comes with reasonable defaults for all "pluggable" behaviors, but by writing a little code (and in some cases, even without any coding), one can:

  • Teach USD how to ingest new data sources.
  • Use your own asset management system and "asset resolver."
  • Add new schemas.

As an example of how flexible these (and other) plug-in points make USD, Google was able to provide Draco mesh compression for USD entirely as a plug-in, by combining the powers of USD references and file format plug-ins.

With the various USD technologies roughed in, it was time to plan their rollout both internally to Pixar and to the broader community.

USD
Finding Dory became the first feature film with a USD-based pipeline.

ROLLING OUT USD INTERNALLY AND PUBLICLY

Finding Dory was targeted as the first project to adopt USD at Pixar. We did hit that goal, and Finding Dory became the world's first feature-length film with a USD-based pipeline. The production also took on a significant re-thinking of the back end of the pipeline, taking on a pathtracing-first version of RenderMan as well as new lighting workflows. Given the rest of the innovations happening on that show, USD's rollout was relatively uneventful, exactly as we had hoped.

In order to achieve a quiet rollout, we had a couple of tricks up our sleeves. The first is that we generated USD files along with our old format, TidScene, the latter files being the ones used for actual early production. Our QA department also built a tool, bcsystem, that would render a given shot twice - once with USD files (not yet made visible to production) and once with TidScene files. The renders generated by the TidScene files were ground truth and against which the USD files were compared.

As long as the two renders were different, we kept fixing bugs quietly behind the scenes. The bcsystem tool not only tested the renderer, but the whole pipeline that got us to the final picture. Along with bcsystem, a fairly big-hammer black-box tool, we also had started amassing a big suite of performance and regression tests. Our teams are fiendishly fond of regression and performance testing, keeping track of dozens of metrics across thousands of tests. We wanted to make sure that our new pipeline was going to be built on software that could produce better performance than the old. And the best way to make sure your software stays fast is to measure it continuously.

As we started getting renders that match perfectly, we grew more confident that we could switch the show over without negatively impacting our users. With a flick of an environment variable, we switched Finding Dory to start using USD and never looked back.

While we were rolling out USD on Finding Dory, we started reaching out to trusted partners in the industry, to whom we gave early cuts of the software. Early partners like Luma Pictures, DNeg, and Animal Logic, and a little later DreamWorks, MPC, Rodeo FX, Weta, and many others provided valuable insight and use cases beyond our beloved enclave that ultimately led to making USD useful to so many in the industry.

We chose SIGGRAPH 2016 as the event where we would officially push the "make public" button on the USD repository hosted on GitHub. We all gathered around a round table set up for one of our USD Birds of a Feather-style meetup with our small community at the time. We pushed the button right as folks started streaming in for the meeting. Once everyone was seated, we announced that USD is now public on GitHub. It was the culmination of much hard work for many of us on the team, and it was overwhelming to see the support and the enthusiasm that had already been garnered by that time.

As of 2020, all geometry, shading networks and bindings, rigid-body, hair- and cloth-simulation, and large-scale skeletal crowd animation travel through Pixar's pipeline in USD. These different kinds of data will generally be in different files, for workflow reasons (and there may often be multipleshading USD files for a single asset, again for workflow). However, being able to encode them all as USD brings great simplification and amplification of the power of our toolset - not just for ease of interchange, but also for leveraging a common language and debugging suite between departments and artists.

Given the advent of MaterialX and USD's ability to consume and encode MaterialX networks, there may be a future where our shaders themselves will be expressed by standardized nodes in USD vs. custom shading nodes driven by in-house OSL and GLSL code.

USD IN VFX PIPELINES AND BEYOND

Today, visual effects and animation studios throughout the world have either fully rolled out USD or are experimenting with it for rollout within their pipelines. At SIGGRAPH 2018, DreamWorks gave a talk titled "Zero to USD in 80 days: transitioning feature production to Universal Scene Description at DreamWorks," which highlighted how it is possible to quickly ramp up and adapt an existing, mature pipeline to USD. Animal Logic, the same year, also presented a talk titled "Forging a new animation pipeline with USD." There was an entire session at SIGGRAPH 2018 dedicated to production use of USD, almost all outside of Pixar.

Animal Logic had taken a different approach very early on to USD interoperability within Maya, motivated mainly by the lack of animation support within the USD plug-in provided by Pixar. For the first few years of USD, integration with Maya meant choosing between the Pixar plug-in or the Animal Logic plug-in, or using both. Fortunately, Autodesk, collaborating with developers at Pixar, Animal Logic, Luma, and many other important contributors to these projects across several studios, is encapsulating the two plug-ins under one umbrella with the goal of combining the best of each into one comprehensive plug-in. This project has picked up momentum in the late half of 2019 and is well on its way in 2020.

The Foundry was the first to integrate a Hydra viewport directly into its flagship lighting and rendering software, Katana, at first to leverage the fast preview renderer. First-class support for USD is currently being worked on for Katana and other Foundry products.

In 2019, Houdini launched a new, USD-based workflow under the Solaris banner. Houdini is the first of the major DCCs to ship with a deep, fundamental integration of USD into its system.

So many other VFX and game projects are starting to integrate USD and Hydra in various forms, from Unity, to Unreal Engine, to Blender, that we think that within the next year or two, it should be fairly easy for folks to get their hands on software that knows how to manipulate USD.

USD has also penetrated beyond film and games. In 2018, Pixar collaborated with Apple to design USDZ, which is a single-asset packaged form of USD, which can contain arbitrarily richly-composed scenes and can be consumed just like any other USD asset, but is suitable for Web/network transmission and archiving. Apple relies on USDZ for all of its AR/VR applications. We have also recently been collaborating similarly with companies like Nvidia, which are undertaking ambitious projects based on USD that aim to propel its use in many new industries.

USD
The complexity of Coco is apparent.

WHERE ARE WE HEADED?

So far, USD has been evolving at a break-neck pace. We're a fairly small group of developers, getting contributions from an industry for whom we're grateful. While we keep adding more and more functionality to USD and Hydra, more plug-in points for site customizations, and more schemas to support more domains, occasionally we take a step back to look at the big picture and ask ourselves, where are we headed?

There has been a fair amount of pressure for us to add full rigging support, and we've often held the line that rigging is not within USD's ultimate goal of enabling interoperability in pipelines. Yet, we added UsdSkel, a capable domain of schemas for representing skeletal animation and skinning, including blendshape support geared especially toward representing large crowds.

Along the same lines, USD is famously lacking support for proceduralism, a term that is abused, misused, and confused. While we're not ready to commit to any particular solution quite yet, it's clear that USD needs to address this issue one way or another.

How do we choose what to concentrate on? What to work on next? The amazing community that has grown around USD continues to inspire us and make significant contributions that drive the technology forward. We've added features to USD that were suggested for use cases we never before imagined, and we hope to continue to do so. Of course, we're driven forward first and foremost by the needs of the talented artists and technical wizards here at Pixar, without whom we wouldn't have anything worth writing about and for whom we are eternally grateful.

Speaking of being eternally grateful, we'd like to take this opportunity to thank our amazing team: Sunya Boonyatera, Alex Mohr, Corey Revilla, Varun Talwar, Pol Jeremias-Vila, David Yu, Kees Rijnen, Matthias Goerner, Raja Bala, Tom Cauchois, Florian Zitzelsberger, Adam Fish, Adam Woodbury, Florian Sauer, John Loy, Brett Levin, Matt Kuruc, Stephen Gustafson, Jilliene Tongson Paras, and Jessica Tran. We'd also like to thank our Infrastructure team led by Cory Omand, and our Applications team led by Hayley Iben, and all of Pixar's engineers and artists. We owe USD to the trust that our VP, Guido Quaroni, put in us and to our CTO, Steve May, and the rest of the Executive Team at Pixar who helped create the wonderful environment that allowed USD to flourish. Finally, to all of our contributors (look for a list here http://graphics.pixar.com/usd/docs/USD-Contributors.html), a huge big thank you.

F. Sebastian Grassia and George ElKoura are lead software engineers at Pixar Animation Studios.