AI-book piracy drama

A social thread from @cb_doge accuses Anthropic of training Claude on millions of pirated books from LibGen and claims a record $1.5 billion copyright settlement, a claim that has ignited heated online debate about AI training datasets and publishing rights. That conversation matters because it affects how publishers, authors, and AI companies negotiate training data and compensation going forward. (x.com)

# AI-book piracy drama A viral social post turned a dense copyright case into a simple claim: Anthropic trained Claude on millions of pirated books from Library Genesis, known as LibGen, and then paid a record $1.5 billion to settle with authors. That basic outline is close to the public record, but the details are more important than the post makes clear. The case did not end with a court ruling that all artificial intelligence training on books is illegal. It split the issue in two: training on lawfully obtained books, and copying books from pirate libraries. (courtlistener.com), Ars Technica PDF via court order, (cnbc.com)) The lawsuit was filed on August 19, 2024, in federal court in Northern California by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson. They accused Anthropic, the company behind Claude, of building its book corpus by downloading works from pirate sources including LibGen and Pirate Library Mirror, often shortened to PiLiMi. (courtlistener.com), Authors Guild, Publishers Weekly) LibGen is not a normal bookstore or library. It is a shadow library, meaning a giant unauthorized archive of digital books and papers that users can download for free even when the works are still under copyright. That matters because a company can make one legal argument about analyzing books it bought, and a very different argument about mass-downloading books from a pirate source. (The Atlantic, Authors Alliance, Authors Guild) That distinction became the center of the case in June 2025. Judge William Alsup ruled that using lawfully purchased books for large language model training was fair use, but he refused to extend that protection to pirated copies downloaded from LibGen and similar sources. In the order, the court described the training use as highly transformative, while treating the piracy question as ordinary copyright infringement. (Ars Technica PDF via court order, Authors Alliance, Authors Guild) That ruling changed the shape of the fight. Before June 2025, the case looked like a broad test of whether artificial intelligence companies could train on copyrighted books at all. After June 2025, the surviving high-risk claim was much narrower and much blunter: whether Anthropic had illegally copied huge numbers of books from pirate libraries and what damages it owed for doing that. (Ars Technica PDF via court order, Publishers.org FAQ PDF, Writer Beware) The $1.5 billion figure is real, but it came later, in a proposed settlement filed in September 2025, not from the original social post alone. Multiple outlets reported that Anthropic agreed to pay at least $1.5 billion to resolve the class action, and the filing was described as the largest publicly reported copyright recovery of its kind. A federal judge then granted preliminary approval on September 25, 2025. (cnbc.com), CBS News, Authors Guild) The settlement also appears to have covered a very large class, but not every author in the world. According to author and publisher groups tracking the case, the class was tied to books Anthropic allegedly downloaded during specific periods in 2021 and 2022, and eligibility depended on factors including copyright registration. In other words, “millions of pirated books” describes the scale of the alleged copying ecosystem, but the compensation process focused on a defined list of works and rightsholders, not a vague internet-wide pool. (Authors Guild FAQ, Authors Guild, Publishers.org FAQ PDF) This is why the online debate keeps getting tangled. Many people hear “Anthropic settled” and assume a court found that training artificial intelligence on copyrighted books is categorically unlawful. That is not what happened in this case. The court’s June 23, 2025 order drew a line between training on legally acquired books and building a central library from pirated files. The settlement then resolved the piracy-centered claims without a final trial verdict on damages. (Ars Technica PDF via court order, Authors Alliance, (cnbc.com)) The case also landed in the middle of a larger publishing fight. Authors and publishers have spent the past two years trying to answer a basic question: if an artificial intelligence company uses books to make a commercial model, who gets paid, and on what terms? One camp argues that training is like studying every book in a library and should often count as fair use. Another camp argues that mass copying at industrial scale, especially from pirate sources, looks less like reading and more like taking inventory without paying the

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.