Scalable CPU-Based SDVis Enables Interactive, Photoreal Visualization
Rob Farber
June 25, 2018

Scalable CPU-Based SDVis Enables Interactive, Photoreal Visualization

Your current view of visualization is about to change, as interactive CPU-based OpenGL and photorealistic interactive raytracing visualization are here. “We are entering the era, based on the data size, where the scalability and constant runtime of software-defined visualization (SDVis) wins over GPUs for visualization,” says David DeMarle, visualization luminary and engineer at Kitware.
Eliminating the need for specialized visualization hardware democratizes visualization for everyone. It also frees both enterprise and HPC users from having to procure specialized visualization hardware. 

Think fat-server, thin-client for all your visualization needs, from small and/or low-resolution to generating photorealistic images for your largest, most complex visualizations. The good news is that with SDVis, the user controls the visualization experience to maintain a “comfortable” experience using their available computational resources.

CPU-based SDVis raytracing and OpenGL technology is freely available to everyone to use. The simplest way is to download and use the open-source ParaView application, the open-source VTK toolkit, or use SDVis in your own codes by directly calling the open-source OSPRay raytracing engine or OpenSWR OpenGL packages. 

DeMarle notes that only a one line change is all that is required for VTK and ParaView users to switch between SDvis raytracing and OpenGL rendering. Further, SDVis technology enables in-situ visualization, where rendering occurs on the same nodes as the computation (considered by many as a requirement for exascale modeling and simulation).


Figure 1: Comparative images rendered using rasterization and raytracing. (Image courtesy Jeffrey Howard and Intel)

No longer are researchers tied to high-end GPUs or GPU-based visualization clusters, or any compute cluster or cloud provider, for that matter. Demonstrations at both the Supercomputing 2017 and the Intel HPC Developer Conference in Denver showed that a simple frame buffer or basic GPU is all that is required to interactively visualize even the most complex photorealistic images. For OpenGL users, DeMarle observes that “OpenGL performance does not trail off, even when rendering meshes containing one trillion (10 ** 12) triangles on the Trinity leadership-class supercomputer. We might see a 10 to 20 trillion triangle per second result, as our current benchmark used only 1/19th of the machine.”

Recognizing the potential of SDVis, Guido Reina (postdoc, University of Stuttgart) notes the Stuttgart MegaMol development roadmap includes both rendering animated molecular visualizations using only a browser to complement their existing MatMol software and in-situ visualization efforts. Using a browser reflects the ultimate in fat-server, thin-client HPC visualization – especially when one considers viewing an in-situ visualization of a big molecular simulation running on a leadership or future exascale class supercomputer without requiring any high-end or specialized hardware.

Last year’s big advance in SDVis raytracing 
During the past year, DeMarle exposed the pathtracer in the OSPRay library. The pathtracer is a form of raytracing that utilizes a Monte Carlo method of rendering images to better simulate the complete set of paths that photons take in the real world. As a result, the OSPRay library can now provide photorealistic rendering of reflection, refraction, and soft shadows in materials such as metal and glass as well as caustics and, generally, global illumination. 

Users can balance interactive response time of the pathtracer vs. photorealism. ParaView, for example, now provides a Light Inspector panel that lets users control this capability via a GUI interface.

More specifically, the OSPRay photorealistic pathtracer lets the user control the progressive refinement of the Monte Carlo method so users can run interactively on whatever computational resources are made available to them. This greatly simplifies provisioning visualization workloads in the data center, and cloud-based remote visualization completely eliminates the need to provide any visualization resources at all. Further, progressive refinement means the researcher can minimize cost in the cloud or computational overhead in the data center while they decide what images sequences need to be generated with full photorealism. 

The following series of images shows the image improvements from OpenGL to the OSPRay PathTracer and how the PathTracer preserves full ParaView functionality in a raytracing environment.


Figure 2: Comparative images showing OpenGL vs. ParaView pathtracer render. (Images courtesy Kitware)

The full potential of CPU-based SDVis can better be seen in the following image, which uses the OSPRay sci-vis renderer. 


Figure 3: A complex raytracing example using OSPRay sc-ivis render PV 5.0-5.4+. (Image courtesy Kitware)

Comparative GPU performance
The University of Stuttgart team performed a number of comparative runs on different systems using sphere glyphs that are raycast on a CPU or GPU. They note that the GPU renderer uses straightforward streaming (via statically mapped SSBO), while the CPU renderer is OSPRay using the P-k-d data structure. The team notes that vertex/fragment processing impacts GPU performance as the data set scales up. Meanwhile, the OSPRay P-k-d data structure (with log runtime) is able to cull the data to preserve performance. A 1280x720 viewport was used in all cases.


Figure 4: Comparative performance on a number of systems. (Results courtesy University of Stuttgart)

High performance is seen in the frame rate when rendering a series of 30 million triangle per second (tris) 400x400 images on both Intel Xeon and Intel Xeon Phi processors.

Below shows the ParaView benchmark frame times on Intel Xeon and Intel Xeon Phi. 
Renderer
llvmpipe (5/10/17)
Intel Xeon processor (codenamed Haswell) (FPS)
5.291861
Intel Xeon Phi (codename Knights Landing) (FPS)
18.894011

Renderer
OpenSWR (5/11/17)
Intel Xeon processor (codenamed Haswell) (FPS)
0.085990
Intel Xeon Phi (codename Knights Landing) (FPS)
0.308664

Renderer
OSPRay (5/11/17)
Intel Xeon processor (codenamed Haswell) (FPS)
0.006520
Intel Xeon Phi (codename Knights Landing) (FPS)
0.042851
Intel Xeon Scalable processors local and remote performance
Jim Jeffers, director, principal engineer, Visualization Engineering at Intel, highlighted both local and remote CPU-based SDVis performance using the new Intel Xeon Scalable Processors at the 2017 Intel HPC Developer Conference. Kitware performed a similar demo at the 2017 Supercomputing conference.

In the first part of the demonstration, Jeffers interactively rendered a complex scene using raytracing on an eight-node Intel Xeon Scalable processor 8180 cluster located in the conference room. “This demonstration”, Jeffers points out, “highlights how a user can work on small local machines and compute clusters using OSPRay.”

The second part of the demonstration showed that without making any code changes, a user can scale to interactively render that same scene to create high-resolution, very high sample rate photorealistic images on a remote supercomputer (and presumably in the cloud as well). 

For this demonstration, the OSPRay kernel ran remotely on 128 Intel Xeon Scalable processor 8160 nodes at the Texas Advanced Computing Center (TACC). “The TACC 128-node images achieved photorealistic quality faster than we’ve seen before,” Jeffers notes, “because the internal Monte Carlo sampling methods converged in real time; faster than the local screen update rate of the remote desktop connection.”

In terms of performance, Jeffers quoted numbers that it took roughly 5-10 seconds for the Monte Carlo methods to converge on the eight-node system, while convergence occurred in less than a second on the remote 128-node TACC computational nodes. Jeffers points out these results clearly show the scalability of the SDVis approach and the OSPRay package (see note at the end of this article). 

Visually, the rendered images were both photorealistic, and the fact that they were rendered interactively using remote CPUs was stunning. 

OpenSWR benchmarks
The photorealism of raytraced images is clearly compelling, especially for expensive product design and “look” decisions (for instance, automotive design) or when creating publication-quality images. Still, SDVis for OpenGL rasterized applications is still important. In this case, the OpenGL application can link with the well-known Mesa Open Source OpenGL project that includes Intel’s OpenSWR rendering library.

The SDVis benefits of performance, scalability, and the ability to run anywhere are the motivation for using Mesa with OpenSWR, as opposed to a GPU-based OpenGL solution. “Eliminating the need to transfer data to the GPU is the reason why OpenSWR can compete so effectively against GPU-accelerated libraries. Scalability is another reason to consider OpenSWR,” DeMarle explains. 

Benchmark results make this point concrete. Specifically, DeMarle reports in his 2017 Intel HPC Developer Conference presentation that OpenSWR is able to exploit the threaded and vectorized performance capabilities of modern CPUs. As a result, OpenSWR runs between 10x and 80x faster than the Mesa default software renderer, llvmpipe. Tests on the Trinity supercomputer confirm that OpenSWR is able to scale to a thousand nodes when rendering a mesh containing 1.1 trillion triangles. He notes in his presentation that the full machine might be able to render 10 trillion to 20 trillion triangle meshes.


Figure 5: OpenSWR scaling results to 1024 Intel Xeon (Haswell) nodes on the Trinity supercomputer. (Results courtesy Kitware)

In terms of compliance and stability, DeMarle points out that OpenSWR is a production-quality OpenGL library that is regularly tested and passes 100 percent of VTK’s 1800+ validation tests. 

Summary
The benefits of CPU-based visualization have been recognized for decades, but performant CPU-based software has been rare or targeted at a narrow use. Both benchmarks and demonstrations show that is no longer the case. 

Eliminating the need to use specialized hardware for visualization completely redefines visualization workflows. It also drastically redefines and can entirely eliminate the need to provision separate hardware for visualization. This represents a significant simplification and cost savings.  

Further, benchmarks and comparative images show that users can conveniently and comfortably transition from raster-based OpenGL rendering to create visually compelling photorealistic raytraced images using only free, production quality open-source software.

Rob Farber is a global technology consultant and author with an extensive background in HPC and in developing machine learning technology that he applies at national labs and commercial organizations. Rob can be reached at info@techenablement.com.

Author’s note: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors (www.intel.com/benchmarks). Intel does not control or audit third-party benchmark data referenced in this article.