NY Times lawsuit holds OpenAI and Microsoft 'responsible for the billions of dollars they owe for the unlawful copying and use of The Times's uniquely valuable works'

Microsoft and OpenAI logos
(Image credit: NurPhoto (Getty Images))

A lawsuit filed in the Manhattan federal court last week by the New York Times claims that the defendants—Microsoft and OpenAI—have used millions of its articles to train and create its large language models (LLMs) and other products. The Times is seeking damages in realms of billions of dollars, though it doesn't give a specific number.

But yeah, it's going to be looking for a pretty large payout if it does win.

"The law does not permit the kind of systematic and competitive infringement that Defendants have committed," reads the official complaint (pdf warning). "This action seeks to hold them responsible for the billions of dollars in statutory and actual damages that they owe for the unlawful copying and use of The Times's uniquely valuable works."

The lawsuit states that the New York Times had been in negotiations with the defendants "for months" and that it was looking to reach an agreement "in accordance with its history of working productively with large technology platforms to permit the use of its content in new digital products." The idea put forward in the court document is that its goal was both to get fair value out of its contribution to the training, because of the weighting The Times' content was given during training, and to "facilitate the continuation of a healthy news ecosystem, and help develop GenAI technology in a responsible way that benefits society and supports a well-informed public."

For its part, a statement from an OpenAI spokesperson, Lindsey Held, is quoted by The New York Times article itself as saying the company thought that negotiations had been constructive and was "surprised and disappointed" by the lawsuit.

"We're hopeful that we will find a mutually beneficial way to work together," they are quoted as saying, "as we are doing with many other publishers."

One of the most intriguing parts of the lawsuit, and arguably the part that has got The Times' hackles up, is that it seems like OpenAI has given particular weight to the publisher's content during the training of its LLMs.

During the training of GPT-3 specifically, the lawsuit states that one of the key datasets—one weighted as high quality set—used nearly 210k unique New York Times URLs, which amounted to 1.23% of all the sources in the dataset. 

Microsoft Copilot screenshot

(Image credit: Microsoft)

The largest, and most heavily weighted dataset used to train GPT-3, however, includes "at least 16 million unique records of content from The Times across News, Cooking, Wirecutter, and The Athletic."

It also then goes on to state that OpenAI itself has said that the datasets it sees as the most high quality ones are then sampled more frequently during the training of a model. "By OpenAI’s own admission," reads the court document, "high-quality content, including content from The Times, was more important and valuable for training the GPT models as compared to content taken from other, lower-quality sources."

Your next upgrade

Nvidia RTX 4070 and RTX 3080 Founders Edition graphics cards

(Image credit: Future)

Best CPU for gaming: The top chips from Intel and AMD.
Best gaming motherboard: The right boards.
Best graphics card: Your perfect pixel-pusher awaits.
Best SSD for gaming: Get into the game ahead of the rest.

This isn't the first lawsuit against OpenAI for copyright infringement in the training of its LLMs as The Times notes there has also been a lawsuit brought by 17 authors, including George RR Martin and John Grisham, against the company for "systematic theft on a mass scale" and one from Getty against Stability AI, the creators of the generative AI image maker, Stable Diffusion, over the use of its images in the training of its model.

And it's unlikely to be the last lawsuit against AI makers, either. But given the seeming reticence of AI companies to tackle the issues of copyright infringement, and fair compensation for the training of their multi-billion dollar products themselves, it's looking like legal proceedings might be one of the few ways to keep them in check.

Dave James
Editor-in-Chief, Hardware

Dave has been gaming since the days of Zaxxon and Lady Bug on the Colecovision, and code books for the Commodore Vic 20 (Death Race 2000!). He built his first gaming PC at the tender age of 16, and finally finished bug-fixing the Cyrix-based system around a year later. When he dropped it out of the window. He first started writing for Official PlayStation Magazine and Xbox World many decades ago, then moved onto PC Format full-time, then PC Gamer, TechRadar, and T3 among others. Now he's back, writing about the nightmarish graphics card market, CPUs with more cores than sense, gaming laptops hotter than the sun, and SSDs more capacious than a Cybertruck.

Read more
SUQIAN, CHINA - JANUARY 27, 2025 - An illustration photo shows the logo of DeepSeek and ChatGPT in Suqian, Jiangsu province, China, January 27, 2025. (Photo credit should read CFOTO/Future Publishing via Getty Images)
The brass balls on these guys: OpenAI complains that DeepSeek has been using its data, you know, the copyrighted data it's been scraping from everywhere
OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.
If you don't let us scrape copyrighted content, we will lose out to China says OpenAI as it tries to influence US government
MOUNTAIN VIEW, CALIFORNIA - AUGUST 22: A view of Google Headquarters in Mountain View, California, United States on August 22, 2024.
One educational company accuses Google's AI summary of leading to a 'hollowed-out information ecosystem of little use and unworthy of trust' in latest lawsuit
NEW YORK, NEW YORK - NOVEMBER 29: C.E.O. of Tesla, Chief Engineer of SpaceX and C.T.O. of X Elon Musk speaks during the New York Times annual DealBook summit on November 29, 2023 in New York City. Andrew Ross Sorkin returns for the NYT summit for a day of interviews with Vice President Kamala Harris, President of Taiwan Tsai Ing-Wen, C.E.O. of Tesla, Chief Engineer of SpaceX and C.T.O. of X Elon Musk, former Speaker of the U.S. House of Representatives Rep. Kevin McCarthy (R-CA) and leaders in business, politics and culture.
OpenAI claims Elon Musk 'demanded absolute control, and to be CEO' while also agreeing to ditch its non-profit status back in 2017, despite him now suing it for turning decidedly for-profit
Redhead woman using computer laptop at home stressed with hand on head, shocked with shame and surprise face, angry and frustrated. Fear and upset for mistake.
Court documents show not only did Meta torrent terabytes of pirated books to train AI models, employees wouldn't stop emailing each other about it: 'Torrenting from a corporate laptop doesn't feel right'
Microsoft Corporate Vice President, Windows and Devices Pavan Davuluri speaks about Recall during the Microsoft May 20 Briefing event at Microsoft in Redmond, Washington, on May 20, 2024. Microsoft unveiled a new category of PC on Monday that features generative artificial intelligence tools built directly into Windows, the company's world leading operating system. The tech giant estimates that more than 50 million "AI PCs" will be sold over the next 12 months, given the appetite for devices powered by ChatGPT-style technology. (Photo by Jason Redmond / AFP) (Photo by JASON REDMOND/AFP via Getty Images)
Microsoft plans on investing $80,000,000,000 in AI this year, with no sign of the machine learning spending spree stalling just yet
Latest in AI
Still image of Bastion holding a bird, taken from Microsoft's Copilot for Gaming reveal trailer
Microsoft unveils Copilot for Gaming, an AI-powered 'ultimate gaming sidekick' that will let you talk to your console so you don't have to talk to your friends
BURBANK, CALIFORNIA - AUGUST 15: Protestors attend the SAG-AFTRA Video Game Strike Picket on August 15, 2024 in Burbank, California. (Photo by Lila Seeley/Getty Images)
8 months into their strike, videogame voice actors say the industry's latest proposal is 'filled with alarming loopholes that will leave our members vulnerable to AI abuse'
live action Jimbo the Jester from Balatro holding a playing card and addressing the camera
LocalThunk forbids AI-generated art on the Balatro subreddit: 'I think it does real harm to artists of all kinds'
Aloy
'Creepy,' 'ghastly,' 'rancid': Viewers react to leaked video of Sony's AI-powered Aloy
Seattle, USA - Jul 24, 2022: The South Lake Union Google Headquarter entrance at sunset.
Google is rolling out an even more AI-heavy search engine mode because 'power users want AI responses for even more of their searches'
A digitally generated image of abstract AI chat speech bubbles overlaying a blue digital surface.
We need a better name for AI, or we risk talking past each other until actually intelligent AGI comes home mooing
Latest in News
Silent Hill f transmission trailer screenshots
'We've been keeping fans waiting for an awfully long time': We finally got to see more of Silent Hill f and boy, does it look great
A goblin with sharp teeth, wearing goggles, lets out a mischievous cackle in WoW's latest patch: Undermine(d).
The hooligan hacker guild that tore up WoW's newest raid (twice) just posted video evidence of the whole thing, and it's got me feeling weirdly nostalgic
A pasta "display" on a table showing the word "keep" surrounded by fruit. Obviously.
Penne for your thoughts: This pasta display can show three individual frames and it's trying its best, okay
Intel engineers inspect a lithography machine
Finally some good vibes from Intel as stock jumps 15% on new CEO hire and Arizona fab celebrates 'Eagle has landed' moment for its 18A node
Commander Shepard in Mass Effect 3.
Mass Effect's Jennifer Hale, who played femshep, 'saw no line' before she recorded them for Bioware's flagship trilogy: 'It was all cold reading on the spot'
A side by side comparison of two Asus Q-Release systems, with the original design on the top and the bottom showing the apparently new design.
Asus appears to have quietly changed the design of its Q-Release PCIe slot after claims of potential GPU pin damage