Doom benchmarks return: Vulkan vs. OpenGL

When Doom showed up back in May, I ran a bunch of benchmarks to see how it performed. At the time, we were promised that a patch with support for the Vulkan API would show up "soon after launch," which apparently meant around two months. The good news is that after all this waiting, the public Vulkan patch went live last week; there's also a FAQ on Doom's Vulkan support that has additional information.

So what is Vulkan and why should anyone care? The short summary is that Vulkan is the cross-platform low-level API put out by the Khronos Group, the same group that handles the cross-platform OpenGL API. Alternatively, Vulkan is to OpenGL as DirectX 12 is to DirectX 11. The key element we want to discuss is what it means to be 'low-level' and how that changes the game engine. This gets technical, but it will help set the stage for what we see in the benchmarks.

"What do you mean I can't have low-level hardware access!?"

So, what is an API?

Software developers typically use programming libraries to help make their jobs easier. Imagine you want to create a game for Windows; there's a lot of work involved in doing so, but many common tasks can be handled by a programming library—an API, or Application Programming Interface. Rather than reinventing the wheel for each new program, tasks like graphics, audio, window resizing and positioning, reading and writing to storage, and more can simply use an existing library to make life easier.

Focusing specifically on the realm of graphics, the API helps handle things like texturing, lighting, and creating all the amazing visuals we see in modern games. But using a library also involves some level of abstraction, and in the world of graphics it means you have a driver that supports the set of functions in the library, and it maps those to the actual hardware. While AMD and Nvidia GPUs as an example might be similar in many areas, dig deep enough and there are plenty of differences.

In the past decade or two, most graphics APIs have been 'high-level,' meaning there's a larger amount of abstraction. This generally makes the job of the programmers easier, at the cost of some performance optimizations. Some developers have wanted ways to extract more performance from the hardware, however, and they've basically asked for 'low-level' access to the hardware. That means more work in some cases, but it can also improve performance if you know what you're doing. And that brings us to Vulkan and DirectX 12.

There are plenty of differences, just as there are differences between OpenGL and DirectX 11. Microsoft is in charge of the DirectX world, and it only works on Windows platforms; Khronos Group handles OpenGL/Vulkan, and they support multiple platforms—including most smartphones and tablets via OpenGL ES. Where Microsoft had pressure from game developers and hardware companies to create DirectX 12, Khronos Group took a different route and leveraged much of AMD's work on their own low-level Mantle API to create Vulkan. Ultimately, the end goal is the same: allow developers to extract more performance from the hardware (if they want to put in the effort).

There's some politics involved with the low-level API discussion as well. The biggest item is that AMD supports a feature called asynchronous compute, which is basically the ability to mix and match graphics and compute instructions in the execution units. AMD has had their Asynchronous Compute Engine (ACE) as part of their core graphics hardware since the very first GCN GPUs (the HD 7970 and 7950 launched in January 2012), but until Mantle, DirectX 12, and Vulkan came around it didn't actually do a lot. That's because DirectX 11 and OpenGL didn't really have a good way to leverage the ACE, but low-level access changes things.

Last generation's king, the GTX 980 Ti.

Spec sheets don't tell the whole story

Fundamentally, AMD and Nvidia architectures aren't the same, and this is why we run benchmarks. Otherwise we could just look at the specifications and say, "Oh, it looks like AMD's Fury X does 8601 GFLOPS and has 512GB/s of bandwidth; the GTX 980 Ti has 6054 GFLOPS and 336GB/s bandwidth. That makes the Fury X 40-50 percent faster." The reality is that all processors have a theoretical performance, but actually getting close to that figure can be difficult, and the specifics of the architecture help determine how close the real world is to the theoretical world. And this is where ACE can help AMD quite a bit.

I picked the Fury X and GTX 980 Ti for a good reason: on paper the Fury X should be substantially faster, but in practice the 980 Ti ends up with a small lead of around five percent (the Fury X does lead by five percent at 4K, however). That's about a 40 percent difference from the theoretical performance, due to the ways in which Nvidia's Maxwell and AMD's 3rd generation GCN differ. With a low-level API, AMD's ACE has the potential to help better utilize certain resources, particularly if the developers spend some effort to better optimize their code. Instead of utilizing 60-70 percent of the available execution units, they might be able to get to 80-90 percent utilization, and that could make a big difference when it comes to the end user's experience.

AMD's R9 Fury X stumbled at the gate but has been picking up steam ever since.

Now combine the above discussion with AMD's presence in all the current generation of consoles, and AMD has a vested interest in finding ways to improve the performance of their hardware. This is arguably why they created Mantle, and why they're so interested in DirectX 12. But couldn't AMD accomplish something similar by simply spending a lot of resources to optimize their DirectX 11 drivers? Probably not to the same degree, simply because the API isn't designed for things like their ACE.

What about Nvidia—don't they have features and hardware elements that are handicapped by high-level APIs? Probably, though Nvidia for their part has been more interested in creating other gaming libraries rather than specifically focusing on low-level APIs. Certainly developers can do stuff with the PhysX API that they can't do using plain DirectX 11—though much of PhysX could almost certainly be done in other ways. They support Vulkan and DirectX too, and helped id Software with the Vulkan port of Doom.

With all of the above out of the way, let's talk expectations. In my view, the goal of any developer using a low-level API should always be to beat the performance they can get via a high-level API; if performance is worse (than OpenGL or DirectX 11), then it represents a lot of wasted effort.

Imagine someone coming to you with a customized sports car; they brag about replacing the engine, tweaking the transition, and doing all sorts of other work. Then you ask them how much better it performs compared to the stock car. If all their changes resulted in less horsepower, worse handling, a lower top speed, and reduced acceleration, you'd probably think they had lost their mind. On the other hand, they might tune for one or two specific areas at the cost of others—so a higher top speed, or better acceleration—which we also see in software development.

On the next page: Doom Vulkan performance numbers and charts galore.