Why Nvidia's GTX 970 slows down when using more than 3.5GB VRAM

GTX 970

Last week, commenters on Nvidia’s forums, reddit, Guru3D and elsewhere started digging into what looked to be a concerning problem: the GeForce GTX 970 only seems to use 3.5GB of its 4GB of VRAM. Few games can really utilize 4GB of VRAM, but some commenters noted a serious drop in performance or stuttering when pushing the GTX 970 over the 3.5GB threshold. The same problem did not appear to affect the GTX 980.

Nvidia responded on Friday with this statement (and chart):

“The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments.

The best way to test that is to look at game performance. Compare a GTX 980 to a 970 on a game that uses less than 3.5GB. Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.

Here’s an example of some performance data:

GTX980 GTX970
Shadows of Mordor
72fps 60fps
>3.5GB setting = 3456x1944 55fps (-24%) 45fps (-25%)
Battlefield 4
36fps 30fps
>3.5GB setting = 3840x2160 135% res 19fps (-47%) 15fps (-50%)
Call of Duty: Advanced Warfare
82fps 71fps
>3.5GB setting = 3840x2160 FSMAA T2x, Supersampling on 48fps (-41%) 40fps (-44%)

On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference. On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference. As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.”

It’s hard to analyze the effect of pushing either card past 3.5GB of VRAM with the numbers provided above; the framerate will naturally be lowered by running a game at higher resolution or AA settings. Nvidia’s point is that the GTX 970 behaves just like the 980, with performance only decreasing about 1-3%, comparatively.

Those are also average framerates, which don’t address the problem some commenters have pointed out: dramatic framerate stutter at the moment the GTX 970 starts utilizing its final 500MB of VRAM. This user-created Nai’s Benchmark claims to show that the memory bandwidth of the GTX 970 drops dramatically when accessing that last 500MB, while the same problem does not affect the GTX 980. These numbers look bad, though we can’t vouch for the veracity of data provided by the benchmark.

Image via LazyGamer

Image via LazyGamer

On Sunday, Nvidia Senior VP of GPU Engineering Jonah Alben spoke to PC Perspective about the issue, and we finally have clarification on where that discrepancy comes from. PCPer writes:

“The most important part here is the memory system... connected to the SMMs through a crossbar interface. That interface has 8 total ports to connect to collections of L2 cache and memory controllers, all of which are utilized in a GTX 980. With a GTX 970 though, only 7 of those ports are enabled, taking one of the combination L2 cache/ROP units along with it. However, the 32-bit memory controller segment remains.

"You should take two things away from that simple description. First, despite initial reviews and information from NVIDIA, the GTX 970 actually has fewer ROPs and less L2 cache than the GTX 980. NVIDIA says this was an error in the reviewer’s guide and a misunderstanding between the engineering team and the technical PR team on how the architecture itself functioned. That means the GTX 970 has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980. Before people complain about the ROP count difference as a performance bottleneck, keep in mind that the 13 SMMs in the GTX 970 can only output 52 pixels/clock and the seven segments of 8 ROPs each (56 total) can handle 56 pixels/clock. The SMMs are the bottleneck, not the ROPs.”

If you don’t speak graphics card, PCPer helps break down the architecture of the GTX 970. Because the GTX 970 only has seven ports connecting memory controllers and cache, one of those ports would always be burdened with twice as many requests.

PCPer explains “if the 7th port is fully busy, and is getting twice as many requests as the other port, then the other six must be only half busy, to match with the 2:1 ratio. So the overall bandwidth would be roughly half of peak. This would cause dramatic underutilization and would prevent optimal performance and efficiency for the GPU.”

Nvidia avoided that problem by dividing the memory into a 3.5GB pool and a 0.5GB pool. Few games (currently) require more than 3.5GB of VRAM, so the primary pool can be accessed at maximum bandwidth.

PCPer writes: “Let's be blunt here: access to the 0.5GB of memory, on its own and in a vacuum, would occur at 1/7th of the speed of the 3.5GB pool of memory. If you look at the Nai benchmarks (pictured above) floating around, this is what you are seeing.”

Accessing that last 500MB of VRAM is absolutely slower than accessing the first 3.5GB. What we don’t know, exactly, is how much that actually matters for gaming. PCPer points out that the last chunk of VRAM is still four times faster than system RAM (your DDR3) accessed via PCIe. The GTX 970 does have 4GB of VRAM, and it can use all of it, but accessing those last 500MB will decrease performance.

We’re doing our own testing to see if we can determine how much impact using the last chunk of VRAM has on gaming. Having used the GTX 970 extensively, we can still say that it’s a fantastic card for the price and an overclocking beast. But that doesn't excuse Nvidia’s omission, intentional or accidental, as PCPer highlights: “at the very least, the company did not fully disclose the missing L2 and ROP partition on the GTX 970, even if it was due to miscommunication internally.”

We’ll have more on this issue as we continue testing. Thanks to PC Perspective for their excellent reporting.

We recommend