Nvidia just unveiled its next generation Turing architecture (opens in new tab) at SIGGRAPH, a conference for graphics professionals. It also spilled the beans on the upcoming line of Quadro RTX graphics cards (opens in new tab) for workstation users. This comes as a bit of a surprise, as most have been expecting Turing to first show up in the RTX 2080 and other GPUs in the near future. Those GeForce GPUs are still likely to go on sale first, but we now have some clear ideas of what to expect.
First, Turing is the name of the architecture, and it's the next evolution of Nvidia GPUs following Volta. Nvidia refers to Turing as its "eighth-generation GPU architecture," but I'm not sure how it arrives at that number. Working backward, there's Volta, Pascal, Maxwell, Kepler, Fermi, and Tesla (six generations). Prior to naming architectures after famous scientists, however, Nvidia had six additional generations of GeForce architectures. I guess everything prior to the introduction of the CUDA cores in 2006 is considered as one 'architecture'? It's no worse than the way Intel does CPU generations, though, so let's move on.
The big surprise with Turing is that it will include Tensor Cores, which were first available in the Volta GV100, along with new RT Cores designed specifically to accelerate ray-tracing. The goal appears to be real-time ray-tracing, something previously demonstrated at GDC earlier this year on Volta GPUs. The RT Cores enable Turing-based GPUs to "simulate the physical world at up to six times the speed of the previous Pascal generation" GPUs. That's for the final rendering output, but the RT Cores can be up to 25 times faster than Pascal for ray-tracing operations, and "more than 30 times the speed of CPU nodes."
The Tensor Cores are more of a known quantity. These provide a dense cluster of computational units that can be used to accelerate machine learning. The Volta GV100 includes 640 Tensor Cores with a peak computational speed of up to 110 TFLOPS (trillions of floating point operations per second) for FP16 (16-bit floating point) workloads. With Turing, Nvidia says it can do "up to 500 trillion tensor operations per second," though these are INT4 (4-bit integer) operations rather than FP16 operations. Nvidia says Turing processors will have up to 576 Tensor Cores, a step down from Volta, but Turing processors should still prove incredibly adept at deep learning training and inference.
Besides these new features, Turing will include traditional graphics support with Nvidia CUDA cores. Many have guessed—incorrectly—at the number of CUDA cores we'd see in Turing, but Nvidia has now provided at least two numbers. The new Turing GPUs will initially top out at a maximum of 4,608 CUDA cores, a small step down from the maximum 5,120 seen in the Volta GV100. A lower tier product has 3,072 CUDA cores, which would be a big upgrade for midrange GPUs. The Turing SM (streaming multiprocessor) has also been reworked, with a new ability to issue floating-point and integer operations in parallel. That gives Turing a maximum speed of 16 TFLOPS for floating-point operations (presumably FP32), and 16 TOPS of integer operations.
The 16 TFLOPS figure also gives us a realistic target for turbo clocks on the Turing GPUs. 4,608 cores doing FP32 FMA operations (two FLOPS) would require a clockspeed of 1736Mz to hit 16 TFLOPS. So forget the rumors of 5,120 cores at 1.5GHz, or 3,840 at 2.5GHz. All indications are that Turing will have clockspeeds similar to Pascal, only with 20 percent more CUDA cores. And you can expect Nvidia to push hard on the RT Cores and RTX branding.
Along with discussing the Turing architecture, Nvidia also talked about upcoming Quadro RTX cards planned for launch in Q4 of this year. These are professional GPUs designed for workstations used in film and video content creation, CAD/CAM, and scientific workloads. The new naming scheme highlights the importance Nvidia is placing on the RT cores, and it seems likely that the future GeForce cards will carry the same RTX branding. As discussed on Reddit last week (opens in new tab), Nvidia filed for a number of new trademarks, including Quadro RTX and GeForce RTX. So those upcoming 1180 cards? They'll probably be GeForce RTX 1180 and so on. (Update: they'll likely be RTX 2080.)
Let's finish with some hard specs, from the Quadro parts. Nvidia has details on three models, the Quadro RTX 8000, 6000, and 5000.
The above specs look impressive, and elsewhere it was noted that the new cards will be using GDDR6 14GT/s memory. We don't know the memory bus width, but the 24/48GB cards almost certainly have a 384-bit bus, with the 16GB card using a 256-bit bus. That translates to 672GB/s of memory bandwidth for the high-end models, and 448GB/s for the lower spec cards. The Quadro cards also support 100GB/s NVLink connectors, which might make their way onto high-end GeForce cards for 2-way SLI.
Historically, Nvidia has never created a GPU that was specific to Quadro cards. Instead, Quadro cards have used the same GPUs as GeForce cards, but with modified drivers tuned for professional workloads, along with other minor differences. I don't expect Turing to change this pattern, which means the above GPUs are almost certain to show up in GeForce cards in the near future. Nvidia might disable the Tensor cores on GeForce (to make the Quadro cards more desirable), but the RT cores almost certainly will stick around. The question is: what cards will get the various GPUs?
For the high-end cards, we could get a GeForce RTX 1180 (or 2080, or some other number) that otherwise looks a lot like the Quadro RTX 6000, maybe with half the VRAM and perhaps with a few SMs disabled. Another option is that the top model will be the new GeForce RTX 1180 Ti instead, with trimmed down versions for the 1180 and 1170. I think the latter is more likely, for several reasons.
There's a large gap in core counts between the Quadro RTX 5000 and 6000, which implies a different base GPU (eg, GT100 and GT104). While the GeForce RTX 1160 could be a major upgrade over the current GTX 1060, that seems unlikely at best. More probable is that the Quadro RTX 5000 hardware will be similar to the GeForce RTX 1170, with a lower core count model for the GeForce RTX 1160.
The other reason I think the Quadro RTX 6000/8000 will match up with an 1180 Ti card (or something similar in the product stack priced at around $1,000-$1,500) is that the die size of the top Turing GPUs is really large. Nvidia lists the transistor count at 18.6 billion, with a die size of 754mm2. That's one of the largest GPUs Nvidia has ever produced, so there's not really any room for a larger variant of Turing. The Volta GV100 for reference is 21 billion transistors and 815mm2, while the Pascal GP102 inside the GTX 1080 Ti is 11.8 billion transistors and is a comparatively small 471mm2.
There are other possibilities as well, but whatever the names and prices, we'll likely find out more next week during Nvidia's GeForce gaming celebration at Gamescom.