The Nvidia Ampere graphics card architecture has been finally unveiled and detailed. For years we had rumours yet no real proof the the next-gen GPU tech would bear the French physicist's surname, but with CEO Jen-Hsun Huang taking us through the key features of at least the server-based version—from his kitchen—we know it's real.
The sequel to the Volta and Turing GPUs should be the architecture to take real-time ray tracing further than we've yet seen and maybe even deliver it at 4K without turning your games into a slideshow. If all the rumours are true it could dominate the best graphics cards lists for the next 12 months at least. Mind you, if all the rumours are true then it's going to be a million times faster than Turing and cure death.
As expected the delayed GTC keynote only showed us what the Nvidia Ampere GPUs will be doing in the next generation of the green team's high-performance and professional server graphics cards. GTC has historically been a show aimed squarely at the pros, with Nvidia either hosting standalone events, or piggybacking gaming shows, for its GeForce line of consumer GPUs. So it wasn't a surprise that nothing GeForce-related was teased at the event itself.
During a pre-briefing, however, Jen-Hsun did explain that Ampere would definitely be the architecture to power both its server and gaming GPUs going forward "with a single platform that streamlines Nvidia's GPU lineup." He also added that "there's great overlap in the architecture, but not in configuration."
At a glance…
Nvidia Ampere release date
The first Ampere-powered devices have already gone out to researchers and cloud-service providers, but those are just pro/server parts and realistically the gaming cards won't see a launch until September at the earliest.
Nvidia Ampere specs
The GA100 GPU inside the first A100 cards is a bit of a monster. With 6,912 FP32 CUDA cores, 3,456 FP64 CUDA cores, arrayed across 108 Streaming Multiprocessors (SMs), it packs in 54 billion transistors The full GA100 chip has 8,192 FP32 CUDA cores and 128 SMs, with 3rd Gen Tensor Cores offering 4x the deep learning performance. It also uses TSMC's 7nm CoWoS 3D chip-stacking technology. It's a BIG chip.
Nvidia Ampere performance
The perf rumours have all surrounded the improved ray tracing performance, which is suggested to be four times that of Turing. The expectation being that Turing was a testing ground, a development kit, for ray traced graphics, and that Ampere will be able to offer the advanced visual features with little of the performance hit Turing suffers from. So far there's been nothing about the traditional rasterised performance of the Ampere architecture, however.
Nvidia Ampere price
The pricing of Nvidia's next-gen is going to be one of the most fascinating parts of the Ampere equation. With increased competition from AMD's RDNA cards and the potential Big Navi GPUs, there will be more pressure on the new GeForce cards to be priced aggressively. You can bag a DGX A100 for around $200K now though.
As expected, despite Nvidia CEO, Jen-Hsun Huang, beaming from his kitchen May 14, there was no announcement of any new GeForce gaming GPUs. The rescheduled GTC keynote introduced the new Ampere architecture, but only from the perspective of the pros.
With Ampere's GTC unveiling being all about the server side you'll have a lot of trouble getting anything out of Nvidia that you could build an enthusiast gaming PC around, even if you could afford the few hundred grand that the DGX A100 system costs.
But that has long been the way of things for Nvidia, with its server tech getting launched ahead of any new gaming GPUs. We're expecting then to see something in the GeForce livery getting a reveal some time in the late Summer or early Fall, matching the original Turing announcement around Gamescom in 2018 with a September release.
With red team also set to launch new gaming graphics cards towards the end of the year, with the AMD RDNA 2, and potential Big Navi GPUs, set to land in a gaming PC near you before 2020 is done, there is going to be some serious competition in graphics cards for a change. And you can bet that Nvidia is going to want to set its stall out early and not give the red team a chance to launch something that might topple the current might of the RTX 2080 Ti before Ampere cards come out.
Aside from the monstrous GA100 Ampere GPU detailed in the Ampere Whitepaper (pdf warning), we are firmly in rumour and speculation corner now, where nothing is true and everything is to be treated with season one Scully levels of scepticism. Every tech YouTuber worth their clickbait headlines and gurning thumbnail images is trying to uncover an engineer or leaker to be trusted, though Nvidia has historically been very good at keeping things under control and under wraps until the very last minute.
Maybe everyone's scared of Jen-Hsun, and that's not beyond the realms of possibility. "He knows where my kids go to school, man..."
That hasn't stopped the rumour mill from grinding away on what we can pretend is a complete Ampere GPU list. Sweet. The only one that's had anything close to an official announcement is the GA100 GPU.
|SMs||CUDA cores||Memory bus|
There is some differing of opinion on what's going to happen to the actual makeup of the gaming-focused Ampere GPUs, with some claiming that it will be set up in an alternate configuration to Turing, with a whole load more RT cores, and others suggesting a similar layout but with more effective silicon inside it.
That top-end GPU, likely only ever to find a home in servers and HPC, houses 8,192 FP32 CUDA cores, 4,096 FP64 CUDA cores, 512 next-gen Tensor cores, across 128 SMs. It's all supported by high-bandwidth memory (HBM) arrayed across an aggregated 6,144-bit memory bus. Those are some hefty numbers... but aren't really going to mean much to us gaming folk.
What might are the ones attached to the GA102 GPU, the Ampere graphics card silicon that could potentially find its way into the Nvidia RTX 3080 Ti. If the green team does decide to carry on with another large numerical jump in nomenclature, anyway. I'm yet to be convinced, because from a purely marketing perspective the 30-series sounds rubbish, and a 3080 Ti or RTX 3090 simply doesn't float my metaphorical boat.
With a total of 84 streaming multiprocessors (SMs), those clusters of CUDA cores, the full GA102 GPU will offer a total of 5,376 of little graphical execution units. Compared with the TU102, with 72 SMs and 4,608 CUDA cores, that represents a jump of just 17%. That doesn't actually sound like a lot considering what a performance leap this next generation of Nvidia GPUs are supposed to offer. Which means the underlying architecture has to offer more than just a 7nm die shrink of Turing.
And that's precisely what is being rumoured: more than just a die shrink, though I doubt the consumer versions will use the same TSMC 7nm CoWoS design as the GA100. There is meant to be more L2 cache inside the GPU and though there seems to be half the number of Tensor Cores—those AI-specific bits of silicon—they seem to perform far better than the previous generation, which should all help when it comes to ray tracing.
There is still more speculation about whose 7nm process Nvidia is going to be using, with both TSMC and Samsung's node being thrown into the mix. Jen-Hsun has confirmed that Samsung will be manufacturing a small number of its graphics chips, with TSMC still set to remain the manufacturer of the vast majority of Ampere silicon.
I'd suggest that maybe Samsung's EUV node would be used for the larger, though smaller volume, professional dies, with the high-volume gaming chips likely to filter out of TSMC's established fabrication facilities, following on from the stacked GA100 chip the Taiwanese company has created for Nvidia.
The latest rumours (via TechPowerUp)have pegged the current engineering sample of the GA102-based card to be operating above the 2,200MHz mark, which is mighty impressive. Even more impressive are the claims that the lower-tier GPUs could potentially run at around 2,500MHz.
Given the rumoured 5,376 core count, and the 2.2GHz speculation, that would suggest a raw GPU performance of over 23TFLOPS. Which is rather spectacular... and would be some 10TFLOPS above the RTX 2080 Ti. So spectacular in fact that it feels quite a stretch for such a potentially chunky GPU.
On the memory side the gaming cards are expected to again house GDDR6 memory, but this time at even higher clock speeds. There have been some suggestions that the top Ampere graphics card is running with 18Gbps memory for a total of 864GB/s of memory bandwidth.
At the base level the current rumoured expectation is that Ampere will offer around a 10% instructions per clock (IPC) increase over Turing. Were it not for the fact that the die shrink will allow for a hike in GPU clock speeds that might not have heralded much of a gaming performance boost for the new graphics architecture.
Boosting up to 2,500MHz, however, will really highlight that potential IPC enhancement and any architectural improvements around that would only help. One recent rumour has suggested that we'll see at least a 40% increase in traditional rasterised performance from the Ampere GPUs, and a 4x improvement on the ray tracing side.
That's likely where a lot of the noise will be made around Ampere, with the suggestion being that Ampere will come close to culling the performance hit that enabling ray tracing weighed Turing down with. If Ampere really can offer ray tracing processing that's four times faster than Turing, with its new Ampere RT cores, then real-time ray tracing no longer becomes an expensive luxury, but a genuine weapon for Nvidia's new graphics cards.
The expectation, in ray traced games at least, is that the new Ampere GPUs will make the out-going Turing generation look positively geriatric by comparison. Which isn't going to be welcome news to anyone who spent $1,200 on an RTX 2080 Ti hoping for a little future-proofing from their sizeable silicon investment.
This is going to be one of the most interesting parts of the whole Nvidia Ampere release, how much the green team thinks it can charge for the new GeForce GPUs.
Turing's range of graphics cards was almost priced with impunity as there was no significant competition from rivals, AMD. This time around, however, it's going to be different. The lower, mid-tier cards from Nvidia have found an AMD Navi-based equivalent, with similar performance often at a more tempting price.
The promise of having AMD RDNA 2 cards, and the Big Navi GPU, set to find a home in high-end gaming PCs before the end of the year means that the red team should have cards capable of competing with the best that Nvidia can currently offer. At least that's been the suggestion anyway.
Whether AMD will release cards able to take on the existing GPU king, the RTX 2080 Ti, is one thing, but whether that competitive performance will carry on to the Ampere generation will likely dictate how much Nvidia feels it can charge for the new gaming cards. If Nvidia's next-gen GPUs are going to have to fight it out on a more level playing field then chances are it's going to have to be more aggressive, because we know from past experience that AMD will be.
You only have to look at the 'jebaiting' that went on with the RX 5000-series cards last year to see what shenanigans the Radeon red team will happily engage in to gain an advantage over the GeForce green team.
But if the top-end of the Ampere is able to make the RTX 2080 Ti look like a mid-range card of yesteryear when it comes to gaming then we could see the continued high pricing of graphics cards.
The third option is that even if there is more competition prices remain sky-high. If AMD decides to match Nvidia tier-for-tier in terms of price then we're going to have to choose whether we want to spend $1,200 for an RTX 3080 Ti or $1,200 for an RX 6900 XTX.