Evaluating VR performance and latency with Futuremark's VRMark

VRMark Image

VR is the talk of the town right now, with all the major players set to ship devices at the end of the month. Wander around GDC and you couldn't swing a cat without smacking someone wearing a VR headset in the noggin. With multiple VR options showing up, one of the questions we inevitably have to ask is: Which VR headset is the best? We already chimed in with some thoughts on that topic, but most of our evaluation of VR kits so far has been based on subjective measurements. What about objective measurements—which headset actually performs best in terms of latency and head tracking, and what sort of performance can we expect from the upcoming VR games and experiences?

Futuremark has long been a player in the performance analysis space, with their 3DMark test suites going back as far as 1998. There's been plenty of controversy over the years about whether or not 3DMark results are meaningful, since they're not actual games, but if you wanted to get some idea of how hardware would perform with a future API, Futuremark has been quite useful. They provided some of the first graphics applications to utilize every version of DirectX starting at DX6.0 up through the current DX12, and it's only natural to expect VR to enter the picture.

Futuremark s GDC booth showing some of the VRMark team

Futuremark's GDC booth showing some of the VRMark team

We've talked about VRMark before, the new test for measuring how your system will handle VR games. Futuremark had a session at GDC titled, "Exploring the performance limits of VR systems," so naturally we went to check things out. We're very interested in objectively evaluating the performance of the various VR headsets, and it's frankly a huge can of worms. To understand why, you have to look at what's going on in the domain of VR.

At the low end, we have VR video content where you're able to look around a scene that was filmed with multiple cameras. The hardware required to process this sort of VR isn't particularly demanding, which is why things like Google's Cardboard and the Samsung GearVR can do all of this on a smartphone SoC. Stepping up a notch in difficulty, rendering real-time environments like Minecraft in VR is far more demanding. We were able to go hang out with Oculus and talk with John Carmack about getting Minecraft running on the GearVR, and there was plenty of optimization work involved (and still ongoing), but the end result is quite good. But even the GearVR is nothing compared to what's happening on the PC.

Both the Oculus Rift and HTC Vive are targeting far more demanding scenarios, with high fidelity graphics and true 3D positional audio. Early development kits for Oculus paved the way, but many had serious problems using them—nausea and discomfort after a few minutes of using DK1 were quite common. As research continued, Oculus eventually determined that the best way to overcome these was by improving display quality and reducing latency, and the later DK2 and Crescent Bay versions of the Rift were far better. The initial consumer release (CV1), along with the Vive, now includes a 90Hz low-persistence OLED display, and there's the rub.

If you want content on those 90Hz displays to feel smooth, you need to be able to render a game at more than 90 fps—or potentially, 90 fps for each eye, which is almost like 180 fps. Look at the minimum hardware specs, however, and you'll see a big problem in the graphics department. GTX 970 and R9 290 are both capable gaming cards, but there are many games where they fall far short of 90 fps, even at lower quality settings. The direction we seem to be going is that VR content developers are the ones tasked with ensuring things tick along at 90 fps or more, which means they're going to need to do a lot of optimizations.

One look at the PC gaming landscape is enough to make us worried that many games are going to fall short of that mark, but we'll know more in the coming months. One possibility is that games will have to do dynamic quality scaling to stay above 90 fps, like the way id Software's Rage dynamically adjusted quality to run at 60 fps. But even the best algorithms sometimes run into hiccups, so we expect to see some variance in VR performance depending on your system configuration. Much like today's non-VR games, the ability to run at higher frame rates will be desirable, as it means you're less likely to fall below the target refresh rate and begin stuttering.

VRMark is Futuremark's take on evaluating the performance of your system in VR scenarios. But unlike 3DMark, it's not just about running a few graphics and physics tests and calling it a day. There will be a performance evaluation of your hardware, but while the VRMark preview allows you to walk around the test environment, so far we haven't seen any actual scores. What Futuremark focused on for the purposes of their GDC presentation was the other aspect of VR performance: measuring latency.

Latency testing in progress on DK2

Latency testing in progress on DK2

To do this, Futuremark has created some pretty cool tools and software, things that we can use to evaluate displays as well as VR kits. We've seen plenty of low latency "gaming" displays over the years, but despite claimed response times falling from 16ms to 8ms and ultimately to 1-2ms, it's often difficult to quantify precisely what the displays are doing. The display latency kit for VRMark addresses this problem, using an oscilloscope sampling at 25,000 times per second to measure screen latency and brightness, and with the right tools it can also measure the latency of VR headset tracking. The right tools for headset tracking are a bit more involved—you need some form of motorized movement that you can control with high precision—but looking at just the display can be illuminating.

An early proof of concept prototype for measuring motion to photo latency the final hardware looks far more professional

An early proof-of-concept prototype for measuring motion to photo latency; the final hardware looks far more professional.

The test involves sending alternating light gray and dark gray frames to the display, and then a single white frame gets sent in the middle. (This white frame is sent simultaneously with the command to move the headset, when Futuremark is testing tracking latency.) Using a photodiode, oscilloscope, and software, it's possible to determine the latency between sending a the white frame and the frame showing up on the display. Calculating the Tr+Tf (rise time and fall time) for the display is also possible, giving us better insight into any display's actual performance and latency.

For their presentation, Futuremark showed data from several displays: a higher quality 144Hz gaming monitor from AOC, two "generic" 60Hz LCDs, Oculus DK1, and Oculus DK2. Wait, what about Crescent Bay, CV1, Vive-Pre, and the final version of the Vive? Now that Futuremark is owned by UL, they're refraining from publishing any figures for hardware that has not publicly launched, which means we have to wait and see how the latest VR headsets work, but as an introduction to VRMark's display latency testing, we have enough to get started. Here's what the results look like for four different displays:

Generic 60Hz LCD Latency

Generic 60Hz LCD Latency

AOC G2460 144Hz Latency

AOC G2460 144Hz Latency

Starting with two computer LCDs, the general charts look about the same, but you have to pay attention to the time scale. On the 60Hz display, each vertical line for screen refresh (the blue lines) is spaced 16.7ms apart, while on the 144Hz panel the lines are only 6.9ms apart. The result is that the total draw latency is over twice as high on the 60Hz display, even though the lower chart doesn't look that different.

Note that response time on the 60Hz panel is 13ms vs. 11ms on the 144Hz panel; the reason is that the 144Hz display doesn't "fall" as quickly as it rises, and in fact the fall time takes an entire frame to occur. That's not great, but we'll inevitably see far worse once we start doing our own testing of displays using these tools. We'll also be able to determine if things like ULMB (Ultra-Low Motion Blur) and the various overdrive settings actually help or not.

Oculus DK1 Latency

Oculus DK1 Latency

Oculus DK2 VRMark Latency

Oculus DK2 Latency

Moving to the VR headsets, the layout is slightly different this time, as Futuremark moves frame persistence to the top-left and total draw latency to the top-right. Forget those for a moment and just look at the DK1 result. Don't worry too much about the saw pattern, as there's apparently a bit of backlight flickering being picked up; instead, just focus on the response time. The DK1 display takes nearly a frame to rise to the maximum value, and it takes nearly two frames to fall to the minimum value. In fact, it mostly lands at the target light gray value after the white frame because it's still transitioning to dark grey when the next frame arrives. Response time is a pretty abysmal 40ms, and frame persistence is 48ms. If you ever tried DK1 and didn't enjoy the experience, the above helps explain some of the shortcomings of the initial release.

DK2 in contrast has a completely different result from the other displays. Oculus moved to a low-persistence OLED for DK2, and it transitions to black between each frame. This is what causes the wave pattern, and you might think this would result in a poor experience. In fact, the opposite is true: Our eyes generally detect the peak brightness, and by transitioning to black between the frames, any "blurring" between frames is eliminated. DK2 also has a 75Hz display instead of a 60HZ panel, so each frame is 13.3ms instead of 16.7ms. Tr+Tf response times aren't better than the 144Hz gaming LCD (11ms vs. 14ms), but frame persistence is basically the same (14ms vs. 15ms), and it's less than a third the frame persistence of DK1.

A few disclaimers are in order, before we wrap up. Futuremark tells us there's still a bit of inconsistency in measuring results on monitors, and they're working to nail that down before the public release. Second, we don't have access to the VR performance testing yet, but that should hopefully come soon. The position of the photodiode also matters, as left and right eyes in VR kits will show different latency, and measuring at the top vs. bottom of a computer display can also skew the results, but we can account for those factors in testing. Then there are the elephants over in the corner: the final launch hardware for HTC Vive and Rift CV1. Given the improvements shown between DK1 and DK2, we're very interested in seeing how CV1 compares, as well as how the Rift and Vive stack up against each other—not to mention other headsets like OSVR.

There are other limitations with VRMark right now. The biggest is that it uses the PC's display output, which means until/unless Futuremark creates a version for iOS/Android, there's no way to run these same tests on things like GearVR—or any of the hundred or more VR headsets being created over in China. Once we have the final software with the performance testing, it's also a return to the age-old problem with 3DMark: What does that score actually mean in practice? We've seen instances where 3DMark may generate a similar score for two graphics solutions, but in real-world gaming tests one of the GPUs is substantially faster. Futuremark will also be looking at ways to evaluate the various APIs (Nvidia VR Works, AMD LiquidVR, the SteamVR SDK, etc.), but those are not yet in place.

Not the only VR benchmark in the works

Note that VRMark isn't the only VR benchmark in the works; Basemark is working on their VR Score test.

Even with all these caveats, however, VRMark is shaping up to be a useful tool that we can add to our testing suite. It may not answer every question we have when it comes to evaluating VR kits, but more data is never a bad thing. Once we have our own testing kit in hand, stay tuned for a collection of computer display testing results. Are the ultra-low advertised response times of some displays truly better, or is it just marketing hyperbole? We aim to find out.

ABOUT THE AUTHOR

Jarred got his start with computers on a Commodore 64, where he has fond memories of playing the early AD&D Gold Box games from SSI. After spending time studying computer science and working in the IT industry, he discovered a knack for journalism, which he’s been doing for more than a decade. This enables him to play with (aka “test”) all of the latest and greatest hardware; it’s a tough life, but someone has to do it. For science.
We recommend