Stand-up comedian and actor Sarah Silverman joins trio suing OpenAI and Meta over claims their AI models 'ingested and used' copyrighted work without permission

OpenAI logo on some cash.
(Image credit: Phil Ashley)

On July 7 Sarah Silverman, a stand-up comedian—also known for her acting work as the voice of Vanellope in the Wreck-It Ralph movies—joined authors Christopher Golden and Richard Kadrey in twin lawsuits against OpenAI and Meta.

As reported by The Verge earlier this week, the suit concerns Silverman's written work, with all three claiming that both ChatGPT and LLaMA (Meta's own large language model program) had been trained on data harvested from “shadow library” sites such as "Bibliotik, Library Genesis, Z-Library, and others." 

The OpenAI suit offers a trio of exhibits, which demonstrate the model's ability to summarise copyrighted books with very few mistakes. These include The Bedwetter, a memoir by Silverman, Ararat, a horror-thriller by Christopher Golden, and Sandman Slim, a supernatural fantasy noir thriller by Richard Kadrey. 

In short—they'd been caught in the program's net at some point, which the suit claims is an infringement of copyright: "Defendants, by and through the use of ChatGPT, benefit commercial and profit richly from the use of Plaintiffs’ and Class members’ copyrighted materials."

Meanwhile the suit against Meta alleges that those same books, as well as several others, were found in the datasets used to train LLaMA. The complaint mentions ThePile in particular, which was created by a company named EleutherAI.

The suit quotes EleutherAI's own description of its dataset as using Bibliotik, one of several "shadow libraries" the suit condemns: "Bibliotik consists of a mix of fiction and nonfiction books [...] We included Bibliotik because books are invaluable for long-range context modelling research and coherent storytelling."

The suit then explains: "These shadow libraries have long been of interest to the AI-training community because of the large quantity of copyrighted material they host. For that reason, these shadow libraries are also flagrantly illegal."

The author's representatives, lawyers Matthew Butterick and Joseph Saveri, write on their litigation website: "Much of the mate­r­ial in the train­ing datasets used by OpenAI and Meta comes from copy­righted works—includ­ing books writ­ten by Plain­tiffs—that were copied by OpenAI and Meta with­out con­sent, with­out credit, and with­out com­pen­sa­tion."

These three authors join a growing furore around the use of AI. Earlier this year, a class-action lawsuit was filed against StabilityAI, Midjourney, and DeviantArt. Just this month, I've reported on the growing concerns in the voice acting and modding community about the use of AI in pornographic voice mods, as well as Unity's unpopular new AI tools

While this tech might have a use in game development, it's clear that the law's scrambling to catch up. It'll be interesting to see the result of this suit—as well as the others that are sure to follow—as AI becomes more and more of a large language elephant in the room across multiple industries.

Harvey Randall
Staff Writer

Harvey's history with games started when he first begged his parents for a World of Warcraft subscription aged 12, though he's since been cursed with Final Fantasy 14-brain and a huge crush on G'raha Tia. He made his start as a freelancer, writing for websites like Techradar, The Escapist, Dicebreaker, The Gamer, Into the Spine—and of course, PC Gamer. He'll sink his teeth into anything that looks interesting, though he has a soft spot for RPGs, soulslikes, roguelikes, deckbuilders, MMOs, and weird indie titles. He also plays a shelf load of TTRPGs in his offline time. Don't ask him what his favourite system is, he has too many.