contact info
- 3rd Floor, Gujranwala Business Center, Near KFC, G.T. Road, Gujranwala, Pakistan
- +92 303 0813333
- +92 303 0644484
- info@hashlearning.com
- info@hashlearning.com
In a groundbreaking decision that could reshape the future of artificial intelligence and copyright law, a U.S. federal judge has ruled that AI models can legally be trained using copyrighted or even pirated books. This ruling has sent shockwaves through the creative and tech industries alike, with authors calling it a betrayal of intellectual rights and AI developers hailing it as a win for innovation.
The debate centers on whether machine learning models using massive datasets—including thousands of books scraped from the internet—are violating copyright laws, or simply engaging in what the court has called “transformative use.”
As large language models (LLMs) like OpenAI’s ChatGPT, Meta’s LLaMA, and Google’s Gemini have become more powerful, they’ve relied on vast amounts of text data to train their algorithms. These datasets include everything from web pages and academic papers to — more controversially — books, many of which were never licensed for such use.
A group of high-profile authors, including Sarah Silverman, Paul Tremblay, and Mona Awad, sued AI companies like OpenAI and Meta, claiming that their copyrighted works were being used without permission or compensation. Central to these lawsuits was a dataset called Books3, a massive, unauthorized collection of over 190,000 books used to train several AI models.
The question before the court: does using these copyrighted works as training data violate the law?
In June 2025, the court ruled that the use of copyrighted (and even pirated) books to train AI systems constitutes “fair use” under U.S. copyright law. The judge emphasized that the books were not being reproduced, distributed, or sold in their original form. Instead, they were being processed and transformed into statistical language patterns to help AI understand and generate text.
“The AI does not retain or reproduce the copyrighted material in a recognizable form,” the judge stated. “The usage is transformative and serves a different purpose than the original work.”
This key argument—that AI usage is transformative—was pivotal in determining that no copyright infringement had occurred.
Authors and advocacy groups like the Authors Guild have condemned the ruling, calling it a dangerous precedent that undermines the creative industry.
“If a company can absorb my book into its system, train its AI to mimic my voice, and then generate content that competes with me—without asking or paying—that’s not innovation, it’s theft,” said author Douglas Preston.
Writers argue that this ruling allows tech giants to benefit from years of creative labor without offering any form of recognition or royalties to the original creators.
For AI companies and developers, the decision offers a significant legal green light. Training LLMs is a resource-intensive process requiring massive datasets. If all training data had to be licensed, it could make development prohibitively expensive—especially for startups and academic institutions.
By defining this kind of data use as fair and transformative, the ruling potentially accelerates AI development in the U.S., while lowering legal risks for tech firms.
“This ruling validates a practical approach to building intelligent systems,” said AI researcher Jennifer Stiles. “It’s not about copying books—it’s about learning language.”
The court’s decision hinged on the four key factors of fair use, which are:
The AI companies used the material for research and model training — not to reproduce or republish.
Books are creative and original, but their use for model training was transformative.
Entire books were processed, but not quoted or shared directly — only statistical patterns were used.
The court found no concrete evidence that AI model training harmed book sales or market performance.
The U.S. ruling is significant, but not universal. Other countries are taking different approaches:
European Union: Under the EU AI Act, AI models must respect copyright law and allow authors to opt out of training.
Canada, Australia, India: Legal frameworks are still evolving, but they tend to lean more toward protecting copyright holders.
AI companies looking to expand internationally may still face serious legal hurdles.
The battle isn’t over. The authors involved are expected to appeal, and the case could eventually reach the U.S. Supreme Court.
In the meantime, we may see:
More legal action from authors and publishers.
Tech companies offering licensing deals to avoid future conflict.
Congressional action to update outdated copyright laws for the AI age.
This ruling marks a pivotal moment in the ongoing clash between AI advancement and intellectual property rights.
While developers see it as a win for innovation, authors fear it paves the way for the unchecked exploitation of creative content. Whether this decision holds up in higher courts—or inspires global policy shifts—remains to be seen.
What’s certain is that the rules of creativity, ownership, and AI are being rewritten in real time.
You must be logged in to post a comment.