Several lawsuits have been filed against OpenAI — the developer behind ChatGPT — and Microsoft for their allegedly unlawful use of copyrighted work to train generative artificial intelligence (AI).
Generative AI is a technology that is able to produce content such as prose, images, and audio files. To do this, generative AI programs rely upon large-language models (LLMs) that are trained using vast amounts of existing data until it is able to correctly predict what comes next in a given piece of source material with a high rate of accuracy.
Whether or not the use of copyrighted materials in this process constitutes a violation of the law has been the subject of much debate in recent years as this technology has rapidly advanced and is now at the center of several high-profile lawsuits that have been filed against OpenAI — the developer of generative AI giant ChatGPT — and others who have created similar programs.
Mid-September, a group of bestselling fiction authors — including names such as David Baldacci, John Grisham, George R.R. Martin, and Jodi Picoult — sued OpenAI for alleged “flagrant and harmful infringements” of the plaintiffs’ copyrighted works by using them to train LLMs.
“These algorithms are at the heart of [OpenAI]’s massive commercial enterprise,” the complaint says. “And at the heart of these algorithms is systematic theft on a mass scale.”
The lawsuit argues that these programs “endanger fiction writers’ ability to make a living, in that the LLMs allow anyone to generate — automatically and freely (or very
cheaply) — texts that they would otherwise pay writers to create.”
“Moreover, [OpenAI]’s LLMs can spit out derivative works: material that is based on, mimics, summarizes, or paraphrases Plaintiffs’ works, and harms the market for them,” the complaint states.
“Defendants could have ‘trained’ their LLMs on works in the public domain,” the lawsuit argues. “They could have paid a reasonable licensing fee to use copyrighted works.”
“What Defendants could not do was evade the Copyright Act altogether to power their lucrative commercial endeavor, taking whatever datasets of relatively recent books they could get their hands on without authorization,” the complaint contends. “There is nothing fair about this.”
“Defendants’ unauthorized use of Plaintiffs’ copyrighted works thus presents a straightforward infringement case applying well-established law to well-recognized copyright harms,” the filing argues.
As a result of their lawsuit, these authors hope to obtain an injunction prohibiting OpenAI from continuing to use their copyrighted works to train its LLMs without “express authorization.”
Mid-December, this lawsuit was followed up with a similar filing by a group of non-fiction authors accusing OpenAI, as well as Microsoft, of building “a business valued into the tens of billions of dollars by taking the combined works of humanity without permission.”
Collectively, the plaintiffs in this case have won three Pulitzer Prizes and written more than thirty New York Times bestsellers.
“Rather than pay for intellectual property,” the lawsuit argues, “they pretend as if the laws protecting copyright do not exist.”
“Nonfiction authors often spend years conceiving, researching, and writing their creations,” the filing states. “While OpenAI and Microsoft refuse to pay nonfiction authors, their AI platform is worth a fortune.”
“The basis of the OpenAI platform is nothing less than the rampant theft of copyrighted works,” the complaint suggests.
“The only way that Defendants’ models could be trained to generate text output that resembles human expression is to copy and analyze a large, diverse corpus of text written by humans,” the lawsuit states.
“In training their models, [OpenAI and Microsoft] reproduced copyrighted texts to exploit precisely what the Copyright Act was designed to protect: the elements of protectable expression within them, like the choice and order of words and sentences, syntax, flow, themes, and paragraph and story structure, as well as the arrangement and organization of facts,” the authors allege.
Similar to the fiction authors who filed the earlier lawsuit, these non-fiction writers sought to have the court permanently enjoin Microsoft and OpenAI from continuing to engage in the behavior of which they are accused in the complaint.
At the end of December, The New York Times (NYT) alleged that OpenAI and Microsoft illegally used the outlet’s work “to create artificial intelligence products that compete” with the paper and “seek[ing] to free-ride on The [NYT]’s massive investment in its journalism by using it to build substitutive products without permission or payment.”
The NYT has taken issue with the use of its content by Microsoft and OpenAI in its LLM training programs, alleging that “while [they] engaged in widescale copying from many sources, they gave [NYT] content particular emphasis when building their LLMs — revealing a preference that recognizes the value of those works.”
“The Constitution and the Copyright Act recognize the critical importance of giving creators exclusive rights over their works,” the NYT’s complaint reads. “Since our nation’s founding, strong copyright protection has empowered those who gather and report news to secure the fruits of their labor and investment.”
“Copyright law protects the [NYT]’s expressive, original journalism, including, but not limited to, its millions of articles that have registered copyrights,” the filing states.
According to the NYT, the generative AI tools developed by Microsoft and OpenAI can allegedly “generate output that recites [NYT] content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples.”
“By providing [NYT] content without the [NYT]’s permission or authorization, [these] tools undermine and damage the [NYT]’s relationship with its readers and deprive the [NYT] of subscription, licensing, advertising, and affiliate revenue,” the complaint claims.
The filing then goes into great detail providing examples of the generative AI tools in question allegedly providing word-for-word reproductions of content from the NYT resulting in the dissemination of content that is otherwise behind a paywall and driving web traffic away from the NYT’s website.
As a result of their lawsuit, the NYT hopes to see Microsoft and OpenAI be permanently enjoined from continuing the alleged practices for training their generative AI tools and ordered to destroy all “LLM models and training sets” using data from the NYT.
In addition to seeking injunctions against Microsoft and OpenAI, those who have brought these complaints are also asking the court to provide various forms of monetary compensation for the damages that have allegedly been done as a result of the defendants’ conduct.
A jury trial has been requested by the plaintiffs in each of these lawsuits.