Scaling Up
Issue: Volume: 27 Issue: 9 (September 2004)

Scaling Up

The Orphanage, a San Francisco-based special effects studio, is no stranger to explosions. In fact, the company's nearly 200 artists—whose credits include the creation of 2D and 3D special effects for such feature films as HellBoy, The Day After Tomorrow, Sky Captain, Spy Kids 3-D, and Charlie's Angels: Full Throttle—immerse themselves in the art of making explosions look more devastatingly real on screen.

The group's impressive arsenal of credits has led to an explosion of sorts within the company as well, with staff growing from 20 employees approximately a year and a half ago to almost 200 today. And according to director of IT Nicholas McDowell, The Orphanage is just at the start of an exponential growth curve.

With ultimate plans for a staff of 500 to 600 employees, The Orphanage gives new meaning to the phrase "explosive growth." It also highlights how important it's been for McDowell and his four-person IT team to get the most use out of the company's current IT systems and storage—all while developing an underlying architecture that is easy to scale at a moment's notice.

The Orphanage's storage needs have also expanded exponentially since McDowell joined the company a few years back. Take the recent HellBoy special effects project that involved almost 10tb of storage capacity at its peak and more than 100 of The Orphanage's artists. "At one point during HellBoy, we were generating 500gb of new data per day," says McDowell.

When McDowell arrived, the studio had one Apple Xserve server and only 2tb of locally attached storage. Today, it has 18tb of storage in a core production system that consists of InfiniteStorage Shared Filesystem CXFS cluster software and a storage area network (SAN) with nine SGI TP9100 disk arrays, each of which has a capacity of 2tb.

Presenting itself as a massive, scalable network-attached storage (NAS) device to clients on the network, the CXFS cluster has eight server nodes at the front of the SAN that act as NAS heads, with open-source Samba code to allow files from the SAN's shared file system to be saved or accessed elsewhere on the network via the CIFS or NFS protocols.
In a few years, The Orphanage's storage system has expanded from one server to nine, and from 2tb of disk storage to 18tb.

Three of the eight servers are SGI Origin systems (models 300 and 350) running the IRIX operating system. The other five are Linux servers from a variety of manufacturers (see "The Orphanage's SAN," this page).

The Orphanage's artists save and retrieve data from the shared file system via the three IRIX servers, while the company's renderfarm communicates with the file system via the Linux servers. Artists use Adobe After Effects software along with custom plug-ins to develop special effects sequences.

The artists then send their work to the renderfarm consisting of 200 Windows-based dual-processor Intel systems from Boxx Technologies. "To quickly and efficiently render a shot that requires up to 300 frames, you need a massive amount of rendering power," says McDowell. "Also, with the CXFS shared file system, an artist can be reading from the same file as the renderfarm works on it."

McDowell, who was already familiar with other clustering technologies prior to implementing CXFS, admits to a steep learning curve when it came to exploring the potential of the shared file system. "It's very powerful, but complex," he says.

Nevertheless, McDowell says the shared file system has played a pivotal role in The Orphanage's ability to scale its architecture, in terms of both server and storage capacity, without incurring downtime. Scaling up was difficult before because it involved shutting the system down to add another card in the server, attach more disks, and so forth, he explains.
For rendering high-resolution images like this scene from The Day After Tomorrow, artists at The Orphanage send their work to the studio's 200-workstation renderfarm. The shared file system allows them to read from a file as the renderfarm works on

McDowell is impressed by what he calls both vertical and horizontal scalability within the CXFS cluster. "Vertical scalability is adding more things: servers, disks, and so on," he says. "Horizontal scalability is increasing the hardware within any of those devices. We have a four-processor Origin 350 that can scale to 65 processors. And we could multiply that by 32 nodes."

How does McDowell see the system growing over the next year or two to handle the company's growth in staff, projects, and storage capacity? He already has plans in the works to increase the number of NAS heads from the current eight nodes to 15 nodes. On the storage side, he is looking at more TP9100 arrays, and also the possibility of a TP9300 or 9500 array to accommodate the addition of The Orphanage's editorial group (which uses Macintosh platforms) to the CXFS shared file system.

We still send gigs and gigs of data a day across the Gigabit Ethernet network. With storage now centralized, it helps to cut down on the amount of data transfers," McDowell explains. "There are still a lot of data transfers going on now because the editorial group is not tied into the CXFS file system." The editorial group is responsible for tasks such as processing all the tapes received in the studio, getting the plates in line, getting frame counts right, and getting the dailies together so the work done that day is available the next morning for review. The team currently operates on its own separate network, which is not yet fully integrated into The Orphanage's production systems.

Once that happens, McDowell says he anticipates that the volume of file transfers will shrink considerably, which will translate into less storage capacity required. "SGI recently released a Mac OS X client for CXFS. Instead of the editorial group having its own storage, it can be [integrated into our central SAN storage]. That would save us a lot of money," says McDowell.

  • Servers (NAS heads):
    • 2 SGI Origin 350 (IRIX)
    • 1 SGI Origin 300 (IRIX)
    • 2 HP/Compaq ML350 (Linux)
    • 1 Open Source Systems AMD 64 Quad (Linux)
    • 1 SGI Altix 350 (Linux)
    • 1 Boxx 3D workstation (Linux)
  • Disk storage:
    • 9 SGI InfiniteStorage TP9100 disk arrays, each with 2tb
  • Storage software:
    • CXFS shared file system