US judge allows use of ‘pirated books’ to train AI

By Hash learning
(0) comments
July 22, 2025

US judge allows use of ‘pirated books’ to train AI

In a groundbreaking decision that could reshape the future of artificial intelligence and copyright law, a U.S. federal judge has ruled that AI models can legally be trained using copyrighted or even pirated books. This ruling has sent shockwaves through the creative and tech industries alike, with authors calling it a betrayal of intellectual rights and AI developers hailing it as a win for innovation.

The debate centers on whether machine learning models using massive datasets—including thousands of books scraped from the internet—are violating copyright laws, or simply engaging in what the court has called “transformative use.”

BACKGROUND: THE BATTLE BETWEEN AI AND AUTHORS

As large language models (LLMs) like OpenAI’s ChatGPT, Meta’s LLaMA, and Google’s Gemini have become more powerful, they’ve relied on vast amounts of text data to train their algorithms. These datasets include everything from web pages and academic papers to — more controversially — books, many of which were never licensed for such use.

A group of high-profile authors, including Sarah Silverman, Paul Tremblay, and Mona Awad, sued AI companies like OpenAI and Meta, claiming that their copyrighted works were being used without permission or compensation. Central to these lawsuits was a dataset called Books3, a massive, unauthorized collection of over 190,000 books used to train several AI models.

The question before the court: does using these copyrighted works as training data violate the law?

THE COURT’S VERDICT: FAIR USE PREVAILS

In June 2025, the court ruled that the use of copyrighted (and even pirated) books to train AI systems constitutes “fair use” under U.S. copyright law. The judge emphasized that the books were not being reproduced, distributed, or sold in their original form. Instead, they were being processed and transformed into statistical language patterns to help AI understand and generate text.

“The AI does not retain or reproduce the copyrighted material in a recognizable form,” the judge stated. “The usage is transformative and serves a different purpose than the original work.”

This key argument—that AI usage is transformative—was pivotal in determining that no copyright infringement had occurred.

AUTHORS PUSH BACK: A BLOW TO CREATIVE OWNERSHIP

Authors and advocacy groups like the Authors Guild have condemned the ruling, calling it a dangerous precedent that undermines the creative industry.

“If a company can absorb my book into its system, train its AI to mimic my voice, and then generate content that competes with me—without asking or paying—that’s not innovation, it’s theft,” said author Douglas Preston.

Writers argue that this ruling allows tech giants to benefit from years of creative labor without offering any form of recognition or royalties to the original creators.

WHAT THIS MEANS FOR AI DEVELOPERS

For AI companies and developers, the decision offers a significant legal green light. Training LLMs is a resource-intensive process requiring massive datasets. If all training data had to be licensed, it could make development prohibitively expensive—especially for startups and academic institutions.

By defining this kind of data use as fair and transformative, the ruling potentially accelerates AI development in the U.S., while lowering legal risks for tech firms.

“This ruling validates a practical approach to building intelligent systems,” said AI researcher Jennifer Stiles. “It’s not about copying books—it’s about learning language.”

UNDERSTANDING THE LEGAL BASIS: FAIR USE DOCTRINE

The court’s decision hinged on the four key factors of fair use, which are:

1. Purpose and Character of Use

The AI companies used the material for research and model training — not to reproduce or republish.

2. Nature of the Work

Books are creative and original, but their use for model training was transformative.

3. Amount and Substantiality

Entire books were processed, but not quoted or shared directly — only statistical patterns were used.

4. Effect on the Market

The court found no concrete evidence that AI model training harmed book sales or market performance.

GLOBAL IMPLICATIONS: A LEGAL DIVIDE

The U.S. ruling is significant, but not universal. Other countries are taking different approaches:

European Union: Under the EU AI Act, AI models must respect copyright law and allow authors to opt out of training.
Canada, Australia, India: Legal frameworks are still evolving, but they tend to lean more toward protecting copyright holders.

AI companies looking to expand internationally may still face serious legal hurdles.

WHAT HAPPENS NEXT?

The battle isn’t over. The authors involved are expected to appeal, and the case could eventually reach the U.S. Supreme Court.

In the meantime, we may see:

More legal action from authors and publishers.
Tech companies offering licensing deals to avoid future conflict.
Congressional action to update outdated copyright laws for the AI age.

CONCLUSION: A DEFINING MOMENT FOR AI AND CREATIVITY

This ruling marks a pivotal moment in the ongoing clash between AI advancement and intellectual property rights.

While developers see it as a win for innovation, authors fear it paves the way for the unchecked exploitation of creative content. Whether this decision holds up in higher courts—or inspires global policy shifts—remains to be seen.

What’s certain is that the rules of creativity, ownership, and AI are being rewritten in real time.

Hash learning

previous post next post

Follow Us On :