AMD Naples shows off how crazily Zen can scale

Over the past few months, AMD has made plenty of waves with the preview of the Zen architecture followed by the full-blown launch of the Ryzen 7 processors. While the Ryzen 7 CPUs aren't perfect, they're a huge improvement from AMD's previous high-end CPU offerings, the FX-series. But those are all desktop parts, and the real star of AMD's Zen architecture might just prove to be the server processors, codenamed Naples.

One of the key design elements of AMD's Zen architecture is that it's supposed to be highly scalable. Instead of the lower performance, low-power 'cat' cores (Bobcat, Jaguar, Puma) and the higher performance, high-power 'heavy equipment' parts (Bulldozer, Piledriver, Steamroller, Excavator), AMD is using a single architecture to cover the whole spectrum of computing performance. From ultraportable devices through laptops, and on to desktops and servers, Zen is the intended solution. To make all of this work properly, AMD reworked many of the fundamental building blocks for Zen, with the Infinity Fabric being one of the major changes.

We've known about AMD's Naples CPU for a while, and AMD has publicly demonstrated working silicon on a few occasions. Last month, we got the chance to see exactly what Naples brings to the table in terms of features and performance, and if you were impressed by Ryzen, just feast your eyes on Naples:

Zen features a 4-core/8-thread module as its primary building block, combining 512K of L2 cache per core with a shared 8MB L3 cache. Naples takes eight of these modules and puts them together in a single package (four 8-core die tightly coupled together), yielding a 32-core/64-thread part. Unless other items have been changed, Naples should also have 64MB of L3 cache, but AMD didn't specifically state the size of the L3 cache. What it did reveal is that each Naples CPU will have a massive 8-channel DDR4 memory controller, supporting up to 16 DIMMs, with 128 PCIe Gen3 lanes of external connectivity. The whole package also comes with an integrated SoC (System on a Chip), so there's no need for external chipsets. One Naples CPU is all that's required to support memory, IO, graphics, and more.

But of course, this is a server part, so it's not destined to stop with a single CPU socket. AMD had a working dual-socket 2U server on display, which bumps the total core count to 64-core/128-thread. The two sockets use the Infinity Fabric to talk to each other—in this case, 64 PCIe lanes are repurposed for socket-to-socket communication. That still leaves 128 total PCIe lanes for external devices, but more importantly it doubles the number of memory channels to 16.

It's a fact that many server workloads become heavily bound by memory bandwidth and performance, and AMD's Naples delivers more than twice the memory bandwidth of Intel's top dual-socket servers. What does this mean for actual server performance? AMD put together a head-to-head battle between its Naples server and a standard Dell server equipped with Intel's fastest current Xeon part, the E5-2699A v4. As with all such benchmarks, there are many other factors that could be manipulated, but Ryzen proved quite capable as a heavily-threaded processor and Naples should build on that.

Running a seismic analysis workload that involves computationally intensive 3D wave equations, which taxes the entire system—cores, memory, and IO—Naples basically destroyed Intel's Xeon server. AMD provided two comparison points, one using equal number of cores and the same DDR4-1866 memory speed (Broadwell-EP Xeon doesn't support higher speed memory when all DIMM slots are filled), and the second using a fully armed and operational battle station. 64 vs. 44 cores, DDR4-2400 vs. DDR4-1866, it's not even close. With equal cores and memory speeds, Naples is twice as fast as the Xeon server, and with all cores enabled and running DDR4-2400, Naples is 2.5 times faster.

Which raises the question, how realistic is this comparison? That's difficult to say, but in the high performance computing scenarios where AMD wants Naples to compete, running software that's optimized for the specific machine architecture is relatively common practice. Unlike gaming workloads, then, that means results on Naples out of the gate should be very impressive, and with plenty of extra cores and memory bandwidth there's every reason to expect AMD to come out ahead in many server workloads. Virtualization is potentially an even bigger market, but AMD didn't have anything to show in the way of virtualized performance.

It's also important to note that dual-socket servers aren't the only option available for Intel Xeon, though they're likely the lion's share of the market. 4S and even 8S server solutions exist as well. I specifically asked AMD about Naples's ability to scale to 4S or higher configurations, and was basically given a "no comment" response. I don't see any reason why it wouldn't support such setups, but of course it would require motherboard support and that's likely not available yet.

That's me in the background checking out the servers.

Overall, Zen and AMD's approach to driving server performance upward is very impressive. Intel hasn't had any serious competition in the server space for a while now, and AMD says the result has been a lot of 'incrementalism.' The Broadwell-based Xeon parts have up to 24 cores, the Haswell Xeons had up to 18 cores, Ivy Bridge models included up to 15 cores, and Westmere had up to 10 cores. In six years, Intel has increased core counts by 2.4X, which isn't terrible but also isn't really driving the market forward. More critically, however, from Westmere EX through Broadwell, memory support has remained at quad-channel per socket, and prices for Intel's top Xeon parts have basically doubled. AMD's Naples could really shake things up in those areas.

Part of the reason AMD is able to jump from a maximum of 16 cores per socket to 32 cores is because AMD's previous Piledriver-based Opteron 6300 series (aka Abu Dhabi) haven't seen any major updates since late 2012. And that's due in large part to the shortcomings of the Bulldozer architecture and its derivatives. Zen is a brand new day for AMD processors, allowing for much greater performance and scalability.

While we may not be champing at the bit to run games on 32-core/64-thread processors for our desktops, rest assured there are game servers that can make great use of such capabilities. And some day, perhaps even our home computers will be running hundreds of CPU cores—paired up with tens of thousands of GPU cores, naturally.

More pertinently, it shows that if AMD has a need to go beyond the current 8-core Ryzen 7 designs—say, on the off chance that games start using more than eight cores—it definitely has the capability. There's also clear potential to add many more PCIe lanes than the current implementation of Ryzen support, though that would necessitate a change in CPU sockets. AM4 is intended to be the one-stop solution for AMD desktop processors for at least several years, but long-term with technologies like M.2, USB3.1, and multi-GPU, it would be great to see a future Zen desktop platform with 48 or even 64 PCIe lanes.

The various Zen-derived APUs are another item that could prove really interesting. With a significantly improved CPU architecture, combined with a Vega-derived GPU, we might actually have integrated graphics that doesn't completely suck, without compromising on the CPU side of things. And Zen, thanks to the Infinity Fabric, can take us there.

Naples is slated to launch in Q2 of 2017, though AMD hasn't revealed pricing or the various models yet. AMD will be partnering with other companies to bring the servers to market, and if this preview of performance is anything to go by, Naples should garner a lot of interest. Intel will also have new Xeon parts later this year, based on the Skylake/Kaby Lake architecture, which may close the gap. I also wouldn't expect to see a rapid overhaul of existing IT infrastructures, as companies tend to be a lot more cautious in that market, but it will be good to have a competitive AMD server solution again.

Jarred Walton

Jarred's love of computers dates back to the dark ages when his dad brought home a DOS 2.3 PC and he left his C-64 behind. He eventually built his first custom PC in 1990 with a 286 12MHz, only to discover it was already woefully outdated when Wing Commander was released a few months later. He holds a BS in Computer Science from Brigham Young University and has been working as a tech journalist since 2004, writing for AnandTech, Maximum PC, and PC Gamer. From the first S3 Virge '3D decelerators' to today's GPUs, Jarred keeps up with all the latest graphics trends and is the one to ask about game performance.