On July 7 Sarah Silverman, a stand-up comedian—also known for her acting work as the voice of Vanellope in the Wreck-It Ralph movies—joined authors Christopher Golden and Richard Kadrey in twin lawsuits against OpenAI and Meta.
As reported by The Verge earlier this week, the suit concerns Silverman's written work, with all three claiming that both ChatGPT and LLaMA (Meta's own large language model program) had been trained on data harvested from “shadow library” sites such as "Bibliotik, Library Genesis, Z-Library, and others."
The OpenAI suit offers a trio of exhibits, which demonstrate the model's ability to summarise copyrighted books with very few mistakes. These include The Bedwetter, a memoir by Silverman, Ararat, a horror-thriller by Christopher Golden, and Sandman Slim, a supernatural fantasy noir thriller by Richard Kadrey.
In short—they'd been caught in the program's net at some point, which the suit claims is an infringement of copyright: "Defendants, by and through the use of ChatGPT, benefit commercial and profit richly from the use of Plaintiffs’ and Class members’ copyrighted materials."
Meanwhile the suit against Meta alleges that those same books, as well as several others, were found in the datasets used to train LLaMA. The complaint mentions ThePile in particular, which was created by a company named EleutherAI.
The suit quotes EleutherAI's own description of its dataset as using Bibliotik, one of several "shadow libraries" the suit condemns: "Bibliotik consists of a mix of fiction and nonfiction books [...] We included Bibliotik because books are invaluable for long-range context modelling research and coherent storytelling."
The suit then explains: "These shadow libraries have long been of interest to the AI-training community because of the large quantity of copyrighted material they host. For that reason, these shadow libraries are also flagrantly illegal."
The author's representatives, lawyers Matthew Butterick and Joseph Saveri, write on their litigation website: "Much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works—including books written by Plaintiffs—that were copied by OpenAI and Meta without consent, without credit, and without compensation."
These three authors join a growing furore around the use of AI. Earlier this year, a class-action lawsuit was filed against StabilityAI, Midjourney, and DeviantArt. Just this month, I've reported on the growing concerns in the voice acting and modding community about the use of AI in pornographic voice mods, as well as Unity's unpopular new AI tools.
While this tech might have a use in game development, it's clear that the law's scrambling to catch up. It'll be interesting to see the result of this suit—as well as the others that are sure to follow—as AI becomes more and more of a large language elephant in the room across multiple industries.