Moore’s law is the basic enabler that gives us transistor improvements, but that’s not all there is to it. The processors themselves get new features and functions with every generation, so it’s a triple improvement – speed, performance, and power reduction.
When Intel x86 processors were first deployed in a workstation with Windows back in 1997, one of the salient features was an integrated floating-point processor. Since then, expanded memory managers, security, and communications were added, and it went from one 32-bit core to 28 64-bit cores, plus a 512-bit SIMD processor and transcoder engines.
Intel has been leading the workstation CPU market for years, and the previous-generation processor, the Broadwell, was a performance leader that delighted millions of workstation users. Intel continued with its processor developments and introduced the next generation of Intel Xeon processors for workstations – the Intel Xeon Scalable processor (dual-socket capable) and the Intel Xeon W processor (single-socket capable). Both processors are built on the Skylake architecture, with more cores, higher frequency, more cache, expanded memory management, and PCIe lanes.
To differentiate the workstation-class Xeon processors from the server-class processors, Intel has added the “W” designation to the name.
Intel designs its CPU architecture and gives it a code name; in this case it is Skylake. That design can find its way into many different form factors, from laptops to super-computers. The processor is also named, such as an Intel Xeon Platinum processor. And those processors get used in a platform that is specifically designed for them. The platform includes supporting chips to provide USB 3.1 type A to C, Thunderbolt 3, gigabit Ethernet, SATA, and other ports.
In regard to the Skylake platform, the supporting chip for the Xeon Scalable processor is known as the Lewisberg PCH, and for the Xeon W processor, it is the Kaby Lake WS PCH.
It can get confusing at times because various people use the different names interchangeably. It gets even more complicated because the processor, supporting chip, and platform all have arcane part numbers too, which among other things denote frequency, core count, and other esoteric elements. And, just to make you even crazier, things are expressed in acronyms.
How Far We’ve Come
Recently, Intel launched two new Intel Xeon processors. The E5-1600 Product Family processor is now the new Intel Xeon W processor (single-socket); this is the first product designed with a “W” for workstations in the product name. The other, the E5-2600 Product Family processor, is now the new Intel Xeon Scalable processor (dual-socket).
We’ve come a long way from that first Windows-Intel based workstation in 1997. It came with a 266 mhz Pentium II processor and could, on a good day, hit 48.3 mflops. The top-of-the-line workstations of the day had a 300 mhz Intel Pentium II processor that could deliver 62.1 mflops.
By comparison, in June 1997, the fastest supercomputer in the world was the ACSI RED at the Sandia National Laboratory, US Nuclear Arsenal, and it could do 1.3 tflops. It took up 1,600 square feet, filled 104 racks that held 9,298 CPUs with 1.2tb RAM, needed 850 kilowatts of power, and cost $46 million ($67 million today).
Now you can have a supercomputer small enough to sit under your desk that beats it. For example, Dell just announced the Precision 7920 Tower with dual Intel Xeon Platinum processors, each with up to 28 physical cores, running up to 3.8 ghz urbo frequency capable of 4 tflops, and 112 threads. Four times faster than the fastest supercomputer just 20 years ago for less than one ten-thousandth the cost, and uses less than one-thousandth the power. Not only that, almost anyone can use it; conversely, very few people could use that magnificent ACSI-RED – which, by the way, is still working, giving the US taxpayers a pretty good ROI.
We made a comparison of professional workstation workloads. Compared to a four-year-old E5-1680v2, 3.9 ghz-based workstation, the new Skylake-based Xeon W provides an average of 87 percent more performance. That kind of improvement would be fine if all you wanted to do was render faster, or maybe load files faster, but the real payoff comes in being able to do what you couldn’t do before.
Famous computer graphics scientist Jim Blinn has an adage: “As technology advances, rendering time remains constant.” The point being, artists and directors, for example, aren’t trying to reduce the time to produce a movie (although their bosses tell them that should be their goal), but rather, they want to make the most beautiful movie they can. The same is true for engineers who are running simulations on ever more complex parts.
Each time Intel raises the performance and the number of threads that can be processed simultaneously, sim users celebrate. Why? Because they can make their model more fine-grained. Finer granularity in an FEA simulation makes for a more reliable, more efficient, and more fine-tuned final product. Consider what that means when designing an airplane wing strut: a lighter and stronger airplane that is not only more durable, but also more fuel-efficient.
The Age of Threads
It’s taken a long time, but the benefit of parallel processing is undeniable – performing processes simultaneously provides huge gains in productivity and accuracy. The problem has been the legacy apps that simply couldn’t be threaded and recompiled. Slowly, the industry has built new apps with threading as an intricate part. And ironically, the ISVs doing that haven’t made any fanfare about it, it has just been a kind of given that any new app naturally would be multi-threaded.
The good news/bad news is if a user is stuck with old-fashioned, single-threaded legacy software, the gains from a new processor and GPU are going to be slight. This is due primarily to clock speeds (CPU, GPU, and memory). But few users are only using one application, and most apps (except some developed in-house) have been upgraded and/or replaced completely with new versions.
When you consider the new Intel Xeon SP (Scalable Performance) Platinum series Skylake processors with 28 physical cores, running up to 3.8 ghz Turbo frequency that is capable of 2 tflops across 56 threads, and then double that in a dual-socket system, you have 112 threads at 3.8 ghz approaching 4 tflops. It’s almost unbelievable. Drop a modern add-in board into the system, such as a GPU designed for compute, and you have a theoretical 16 tflops in a system that can fit under your desk, use conventional wall socket power, and doesn’t require any extra air-conditioning. Oh, and the whole thing would cost under $15,000.
Performance at Hand
In terms of performance, benchmarks tell part of the story. Intel will tell you one can get a 300 percent performance improvement over a machine that is four years old (based on best-published two-socket SPECfp_rate_base2006 result submitted to/published at www.spec.org/cpu2006/results/ as of July 11, 2017), or an 80 percent improvement from the last generation to this one, based on the same data. And that’s all true. It just may not apply to you.
Every user has his or her own workload, so the best that benchmarks can do is give an indication of what a person might achieve. However, over the years, I have yet to hear people saying they didn’t get their money’s worth by getting a new workstation. The math is simple: Do more, or better work, in the same time, and calculate that against the cost of an engineer doing the work.
Furthermore, the generational differences are impressive and illustrate what you can do when you make billions of tiny transistors available to computer architects. However, as mentioned above, it’s the application of all those speedy little transistors that is the real magic and primary benefit to users and organizations.
Workstations don’t break and aren’t cheap, so they don’t get replaced every year, or even every other year. In fact, they seldom get replaced more often than three or four years, and only then if there is a significant improvement in an application and/or the hardware.
Although Moore’s law has been fairly predictable over the last 40 years, with the move to 14nm processes, there is more being accomplished than just clock speedups. With a smaller feature size, more transistors can be stuffed in a chip. When that is done, more functions and faster, wider communications are realized, as well as specialized capabilities such as security, AI, and power management.
Intel has always been a leader in process technology and, therefore, in a perfect place to recognize and exploit the inherent opportunities of compute density and throughput. The Skylake processor is the latest instantiation of that skill, and the users are the beneficiaries.
Jon Peddie (firstname.lastname@example.org) is president of Jon Peddie Research, a Tiburon, CA-based consultancy specializing in graphics and multimedia that also publishes JPR’s “TechWatch.”