Judge Decides AI Firms Can Utilize Certain Copyrighted Materials for Their Training Data

This week, a federal judge handed AI companies a major win, potentially setting a legal precedent for the industry to plunder copyrighted materials to train their large language models.

Anthropic, the large AI company backed by Amazon, has been in a pitched legal battle with a group of writers and journalists who sued the company last summer and accused it of illegally using their works to train the company’s flagship chatbot, Claude. The legality of the AI industry’s entire business model has long depended on the question of whether it is kosher to hoover up large amounts of copyrighted data from all over the web and then feed it into an algorithm to produce “original” text. Anthropic has maintained that its use of the writers’ work falls under fair use and is therefore legal. This week, the federal judge presiding over the case, William Alsup, partially agreed.

In his ruling, Alsup claimed that, by training its LLM without the authors’ permission, Anthropic did not infringe on copyrighted materials because the work it produced was, in his eyes, original. He claimed that the company’s algorithms have…

“…not reproduced to the public a given work’s creative elements, nor even one author’s identifiable expressive style…Yes, Claude has outputted grammar, composition, and style that the underlying LLM distilled from thousands of works. But if someone were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? Of course not.”

Alsup’s ruling departs quite a bit from the writers’ litigation, which accused Anthropic of “strip-mining” human expression and ingenuity for the sake of corporate profits. This ruling is just one judge’s opinion, but critics fear it could easily set a precedent for other legal decisions across the country. AI companies have been sued dozens of times by creatives on similar grounds.

While Alsup’s decision may signal broader victories for the AI industry, it isn’t exactly what you would call a win for Anthropic. That’s because Alsup also ruled that the specific way in which Anthropic nabbed some of the copyrighted materials for its LLM—by downloading over 7 million pirated books—could be illegal, and would require a separate trial. “We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages,” Alsup wrote. “That Anthropic later bought a copy of a book [that] it earlier stole off the internet will not absolve it of liability for theft, but it may affect the extent of statutory damages.”

When reached for comment by Gizmodo, Anthropic provided the following statement: “We are pleased that the Court recognized that using ‘works to train LLMs was transformative — spectacularly so.’ Consistent with copyright’s purpose in enabling creativity and fostering scientific progress, ‘Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.’”

Alsup has presided over several prominent cases involving large tech companies, including Uber, DoorDash, and Waymo. More recently, Alsup ordered the Trump administration to reinstate thousands of fired probationary workers who were pushed out by Elon Musk’s DOGE initiative.

Like
Love
Haha
3
Обновить до Про
Выберите подходящий план
Больше