What is ray tracing, and how does Nvidia's GeForce RTX handle the technology?

Nvidia's Sol demo makes extensive use of real-time ray tracing.

What is ray tracing? That question just became far more relevant for PC gamers, as the new Nvidia GeForce RTX 2080, GeForce RTX 2080 Ti, and GeForce RTX 2070 are adding dedicated hardware to accelerate ray tracing. All of those graphics cards use Nvidia's new Turing architecture, which promises to be the most revolutionary jump in GPUs that we've seen in a long time—perhaps ever. Will these be the new best graphics cards when they become available, or are they priced too high?

We'll find out soon enough, as the GeForce RTX 2080 and 2080 Ti go on sale September 20. Nvidia has provided in-depth information on all the technologies going into the cards in advance of the launch. Armed with that knowledge, here's an overview of ray tracing, rasterization, hybrid graphics, and now Nvidia's GeForce RTX cards are set to change what we can expect from our GPUs.

A short primer on computer graphics and rasterization

Creating a virtual simulation of the world around you that looks and behaves properly is an incredibly complex task—so complex in fact that we've never really attempted to do so. Forget about things like gravity and physics for a moment and just think about how we see the world. An effectively infinite number of photons (beams of light) zip around, reflecting off surfaces and passing through objects, all based on the molecular properties of each object. Trying to simulate 'infinity' with a finite resource like a computer's computational power is a recipe for disaster. We need clever approximations, and that's how modern graphics currently works.

We call this process rasterization, and instead of looking at infinite objects, surfaces, and photons, it starts with polygons. Early attempts might have only used hundreds of polygons at a time, but that number has been steadily increasing as our graphics cards and processors have become faster. Now, games have millions of polygons, but how do you turn all these triangles into an image? Rasterization.

It involves a lot of mathematics, but the short version is that a viewport (the screen) is defined and then a 2D rendition of the 3D world gets created. Converting a polygon into a 2D image on a screen involves determining what portion of the display the object covers. Up close, a single triangle might cover the entire screen, while if it's further away and viewed at an angle it might only cover a few pixels. Once the pixels are determined, things like textures and lighting need to be applied as well.

Rasterization triangle

Doing this for every polygon for every frame ends up being wasteful, as many polygons might not be visible. Various techniques like the Z-buffer (a secondary buffer that keeps track of the depth of each pixel) and Z-culling (discarding objects that are blocked from view) help speed up the process. In the end, a game engine will take the millions of potentially visible polygons, sort them, and then attempt to process them as efficiently as possible.

That's no small task, and over the past couple of decades we've gone from primitive polygons with 'faked' light sources (eg, the original Quake), to more complex environments with shadow maps, soft shadows, ambient occlusion, tessellation, screen space reflections, and other techniques attempting to create a better approximation of the way things should look. This can require millions or even billions of calculations for each frame, but with modern GPUs able to process teraflops of work (trillions of calculations per second), it's a tractable problem.

What is ray tracing?

Ray tracing is a different approach, one that has theoretically been around for nearly 50 years now, though it's more like 40 years of practical application. Turner Whitted wrote a paper in 1979 titled "An Improved Illumination Model for Shaded Display" (Online PDF version), which outlined how to recursively calculate ray tracing to end up with an impressive image that includes shadows, reflections, and more. (Not coincidentally, Turner Whitted now works for Nvidia's research division.) The problem is that doing this requires even more complex calculations than rasterization.

Ray tracing involves tracing the path of a ray (a beam of light) backward into a 3D world. The simplest implementation would trace one ray per pixel. Figure out what polygon that ray hits first, then calculate light sources that could reach that spot on the polygon (more rays), plus calculate additional rays based on the properties of the polygon (is it highly reflective or partially reflective, what color is the material, is it a flat or curved surface, etc.).

To determine the amount of light falling on to a single pixel, the ray tracing formula needs to know how far away the light is, how bright it is, and the angle of the reflecting surface relative to the angle of the light source, before calculating how hot the reflected ray should be. The process is then repeated for any other light source, including indirect illumination from light bouncing off other objects in the scene. Calculations must be applied to the materials, determined by their level of diffuse or specular reflectivity—or both. Transparent or semi-transparent surfaces, such as glass or water, refract rays, adding further rendering headaches, and everything necessarily has an artificial reflection limit, because without one, rays could be traced to infinity.

The most commonly used ray tracing algorithm, according to Nvidia, is BVH Traversal: Bounding Volume Hierarchy Traversal. That's a big name for a complex process, but the idea is to optimize the ray/triangle intersection computations. Take a scene with hundreds of objects, each with potentially millions of polygons, and then try to figure out which polygons a ray intersects. It's a search problem and would take a very long time to brute force. BVH speeds this up by creating a tree of objects, where each object is enclosed by a box.

Image 1 of 6

Image 2 of 6

Image 3 of 6

Image 4 of 6

Image 5 of 6

Image 6 of 6

Nvidia presented an example of a ray intersecting a bunny model. At the top level, a BVH (box) contains the entire bunny, and a calculation determines that the ray intersects this box—if it didn't, no more work would be required on that box/object/BVH. Next, the BVH algorithm gets a collection of smaller boxes for the intersected object—in this case, it determines the ray in question has hit the bunny object in the head. Additional BVH traversals occur until eventually the algorithm gets a short list of actual polygons, which it can then check to determine how the ray interacts with the bunny.

All of this can be done using software running on either a CPU or GPU, but it can take thousands of instruction slots per ray. The RT cores are presented as a black box that takes the BVH structure and an array, and cycles through all the dirty work, spitting out the desired result. It's important to note that this is a non-deterministic operation, meaning it's not possible to say precisely how many rays the RT cores can compute per second—that depends on the BVH structure. The Giga Rays per second figure in that sense is more of an approximation, but in practice the RT cores can run the BVH algorithm about ten times faster than CUDA cores.

Using a single ray per pixel can result in dozens or even hundreds of ray calculations, and better results are achieved by starting with more rays, with an aggregate of where each ray ends up used to determine a final color for the pixel. How many rays per pixel are 'enough'? The best answer is that it varies—if the first surface is completely non-reflective, a few rays might suffice. If the rays bounce around between highly reflective surfaces (eg, a hall of mirrors effect), hundreds or even thousands of rays might be necessary.

Companies like Pixar—and really, just about every major film these days—use ray tracing (or path tracing, which is similar except it tends to use even more rays per pixel) to generate highly detailed computer images. In the case of Pixar, a 90-minute movie at 60fps would require 324,000 images, with each image potentially taking hours of computational time. How is Nvidia hoping to do that in real-time on a single GPU? The answer is that Nvidia isn't planning to do that. At least not at the resolution and quality you might see in a Hollywood film.

Enter hybrid rendering

Computer graphics hardware has been focused on doing rasterization faster for more than 20 years, and game designers and artists are very good at producing impressive results. But certain things still present problems, like proper lighting, shadows, and reflections.

Screen space reflections use the results of what's visible on the screen to fake reflections—but what if you're looking into a mirror? You could do a second projection from the mirror into the game world, but there are limits to how many projections you can do in a single frame (since each projection requires a lot of rasterization work from a new angle). Shadow maps are commonly used in games, but they require lots of memory to get high quality results, plus time spent by artists trying to place lights in just the spot to create the desired effect, and they're still not entirely accurate.

Image 1 of 3

Image 2 of 3

Image 3 of 3

Another lighting problem is ambient occlusion, the shadows that form in areas where walls intersect. SSAO (screen space ambient occlusion) is an approximation that helps, but again it's quite inaccurate. EA's SEED group created the Pica Pica demo using DXR (DirectX Ray Tracing), and at one point it shows the difference between SSAO and RTAO (ray traced ambient occlusion). It's not that SSAO looks bad, but RTAO looks better.

Hybrid rendering uses traditional rasterization technologies to render all the polygons in a frame, and then combines the result with ray-traced shadows, reflections, and/or refractions. The ray tracing ends up being less complex, allowing for higher framerates, though there's still a balancing act between quality and performance. Casting more rays for a scene can improve the overall result at the cost of framerates, and vice versa.

Nvidia had various game developers show their ray tracing efforts at Gamescom, but everything so far is a work in progress. More importantly, we haven't had a chance to do any performance testing or adjust the settings in any way. And all the demonstrations ran on RTX 2080 Ti cards, which can do >10 Giga Rays per second (GR/s)—but what happens if you 'only' have an RTX 2080 with 8 GR/s, or the RTX 2070 and 6 GR/s? Either games that use ray tracing effects will run 20 percent and 40 percent slower on those cards, respectively, or the games will offer settings that can be adjusted to strike a balance between quality and performance—just like any other graphics setting.

Taking the 2080 Ti and its 10 GR/s as a baseline, if we're rendering a game at 1080p, that's about 2 million pixels, and 60fps means 120 million pixels. Doing the math, a game could do 80 rays per pixel at 1080p60, if the GPU is doing nothing else—and at 4k60 it would be limited to 20 rays per pixel. But games aren't doing pure ray tracing, as they still use rasterization for a lot of the environment. This brings us to an interesting dilemma: how many rays per frame are enough?

Nvidia's Optix denoising algorithm at work

Denoising and AI to the rescue

Here's where Nvidia's Turing architecture really gets clever. As if the RT cores and enhanced CUDA cores aren't enough, Turing has Tensor cores that can dramatically accelerate machine learning calculations. In FP16 workloads, the RTX 2080 Ti FE's Tensor cores work at 114 TFLOPS, compared to just 14.2 TFLOPS of FP32 on the CUDA cores. That's basically like ten GTX 1080 Ti cards waiting to crunch numbers.

But why do the Tensor cores even matter for ray tracing? The answer is that AI and machine learning are becoming increasingly powerful, and quite a few algorithms have been developed and trained on deep learning networks to improve graphics. Nvidia's DLSS (Deep Learning Super Sampling) allows games to render at lower resolutions without AA, and then the Tensor cores can run the trained network to change each frame into a higher resolution anti-aliased image. Denoising can be a similarly potent tool for ray tracing work.

Pixar has been at the forefront of using computer generated graphics to create movies, and its earlier efforts largely relied on hybrid rendering models—more complex models perhaps than what RTX / DXR games are planning to run, but they weren't fully ray traced or path traced. The reason: it simply took too long. This is where denoising comes into play.

Many path tracing applications can provide a coarse level of detail very fast—a quick and dirty view of the rendered output—and then once the viewport stops moving around, additional passes can enhance the preview to deliver something that's closer to the final intended output. The initial coarse renderings are 'noisy,' and Pixar and other companies have researched ways to denoise such scenes.

Pixar did research into using a deep learning convolutional neural network (CNN), training it with millions of frames from Finding Dory. Once trained, Pixar was able to use the same network to denoise other scenes. Denoising allowed Pixar to reportedly achieve an order of magnitude speedup in rendering time. This allowed Pixar to do fully path traced rendering for its latest movies, without requiring potentially years of render farm time, and both Cars 3 and Coco made extensive use of denoising.

If the algorithms are good enough for Pixar's latest movies, what about using them in games? And more importantly, what about using denoising algorithms on just the lighting, shadows, and reflections in a hybrid rendering model? If you look at the quality of shadows generated using current shadow mapping techniques, lower resolution textures can look extremely blocky, but it's often necessary to reach acceptable performance on slower GPUs—and most gamers are okay with the compromise.

Take those same concepts and apply them to RTX ray tracing. All the demonstrations we've seen so far have used some form of denoising, but as with all deep learning algorithms, additional training of the model can improve the results. We don't know if Battlefield V, Metro Exodus, and Shadow of the Tomb Raider are casting the maximum number of rays possible right now, but further tuning is certainly possible.

Imagine instead of using the GeForce RTX 2080 Ti's 10 GR/s, use just 1-2 GR/s and let denoising make up the difference. There would be a loss in quality, but it should make it viable to implement real-time ray tracing effects even on lower tier hardware.

If you look at the above image of the goblets, the approximated result on the right still looks pretty blocky, but if that only impacted the quality of shadows, reflections, and refractions, how much detail and accuracy do we really need? And since the RT cores in Turing are apparently able to run in parallel with the CUDA cores, it's not unreasonable to think we can get a clear improvement in visual fidelity without killing framerates.

Welcome to the future of graphics

Big names in rendering have jumped on board the ray tracing bandwagon, including Epic and its Unreal Engine, Unity 3D, and EA’s Frostbite. Microsoft has created an entirely new DirectX Ray Tracing API as well. Ray tracing of some form has always been a desired goal of real-time computer graphics. The RTX 20-series GPUs are the first implementation of ray tracing acceleration in consumer hardware, and future Nvidia GPUs could easily double or quadruple the number of RT cores per SM. With increasing core counts, today's 10 GR/s performance might end up looking incredibly pathetic. But look at where GPUs have come from in the past decade.

The first Nvidia GPUs with CUDA cores were the 8800 GTX cards, which topped out at 128 CUDA cores back in late 2006. 12 years later, we have GPUs with up to 40 times as many CUDA cores (Titan V), and modest hardware like the GTX 1070 still has 15 times as many cores—plus higher clockspeeds. Full real-time ray tracing for every pixel might not be possible on the RTX 2080 Ti today, but we've clearly embarked on that journey. If it takes another five or ten years before it becomes practical on mainstream hardware, I can wait. And by then we'll be looking toward the next jump in computer graphics.