Nvidia allegedly greenlit the use of pirated books from illegal sources to train its AI models, according to an expanded class-action lawsuit
The lawsuit from 2024 is back, this time with bigger allegations.
The capabilities of AI models, such as GPT-5, Gemini, Claude, and Grok, lie in the size and scope of the dataset used to train them. This has also been the source of multiple lawsuits, claiming that the companies performing the training had no right to freely use the data. In an expanded class-action case against Nvidia, however, the accusation goes one step further, with claims that the GPU giant willingly used an illegal source of pirated books to train its models.
As reported by TorrentFreak, an amended complaint (pdf warning) filed at the district court in Oakland, California last week, specifically claims that staff at Nvidia contacted a so-called 'shadow library' known as Anna's Archive, a repository of pirated books and other documents.
The plaintiffs cite internal Nvidia communications as evidence, with the filed document purporting to show someone from the data strategy team at Nvidia writing, "we are exploring including Anna's Archive in pre-training data for our LLMs."
It continues with "We are figuring out internally whether we are willing to accept the risk of using this data, but would like to speak with your team to get a better understanding of LLM-related work you have done."
While Anna's Archive appears not to host any content directly itself, it does act as a 'search engine' for alleged pirate libraries. These third-party hosts aren't exclusively providing access to copyrighted materials, but that content is what they are most infamous for.
The original complaint against Nvidia was filed back in 2024, and as Torrent Freak reported at the time, Nvidia's response was essentially to claim that AI training on such material is not the same as owning an illegally obtained book, or even using it as a human does. "Training measures statistical correlations in the aggregate, across a vast body of data, and encodes them into the parameters of a model," it wrote in response.
In essence, Nvidia is saying that the use of such datasets falls under fair use. Given that the original complaint involved data garnered from another pirated source (Books3), it's possible that Nvidia may choose to use the same counterargument from 2024.
Keep up to date with the most important stories and the best deals, as picked by the PC Gamer team.
Similar claims have been filed against Anthropic and Meta in the past, and in the case of the former, the court judge ruled that while accessing the data did fall under fair use, "Anthropic had no entitlement to use pirated copies for its central library." How the case against Nvidia will fare, well, we'll just have to wait and see.

1. Best gaming laptop: Razer Blade 16
2. Best gaming PC: HP Omen 35L
3. Best handheld gaming PC: Lenovo Legion Go S SteamOS ed.
4. Best mini PC: Minisforum AtomMan G7 PT
5. Best VR headset: Meta Quest 3

Nick, gaming, and computers all first met in the early 1980s. After leaving university, he became a physics and IT teacher and started writing about tech in the late 1990s. That resulted in him working with MadOnion to write the help files for 3DMark and PCMark. After a short stint working at Beyond3D.com, Nick joined Futuremark (MadOnion rebranded) full-time, as editor-in-chief for its PC gaming section, YouGamers. After the site shutdown, he became an engineering and computing lecturer for many years, but missed the writing bug. Cue four years at TechSpot.com covering everything and anything to do with tech and PCs. He freely admits to being far too obsessed with GPUs and open-world grindy RPGs, but who isn't these days?
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.

