Elsevier sues over AI training data
- Elsevier did not file a solo case. On May 5, Elsevier, Cengage, Hachette, Macmillan, McGraw Hill, and Scott Turow sued Meta in Manhattan. - The complaint says Meta copied millions of books and journal articles to build Llama, and asks for damages plus destruction of infringing copies. - This pushes science publishing into the AI copyright fight — and raises the cost of training on unlicensed text.
The new thing here is not “a publisher is mad at AI.” That fight has been running for years. The new thing is that Elsevier — the biggest name in scientific publishing — has now joined a broad copyright suit against Meta over Llama, and it did it alongside four other major publishers plus author Scott Turow. The case landed in the Southern District of New York on May 5, 2026, which means the argument over AI training data just got a lot more institutional. ### Who is actually suing whom? The plaintiffs are Elsevier, Cengage, Hachette, Macmillan, McGraw Hill, and Turow’s company S.C.R.I.B.E. The defendants are Meta Platforms and Mark Zuckerberg personally. That matters because this is not a niche author complaint or a one-off academic dispute — it is a coalition of textbook, trade, and scholarly publishers going after one of the biggest model builders in the world. (hachettebookgroup.com) ### What do they say Meta did? Basically, the complaint says Meta used pirated and unauthorized copies of copyrighted works to train Llama. The publishers say the copying happened at multiple stages — getting the files, loading them into computer systems, converting them into machine-readable training inputs, and then using them to build successive versions of the model. The suit frames that as straightforward infringement, not some incidental technical process. (hachettebookgroup.com) ### Why is Elsevier the part to watch? Elsevier changes the feel of the case. Book publishers have already been active in AI litigation, but Elsevier brings journal articles and the economics of scientific publishing into the center of the fight. That widens the stakes from novels and nonfiction to research literature — the stuff AI companies have strong incentives to ingest because it is dense, current, and highly structured. (publishers.org) Nature called this the first time a science publisher has sued over scraped research papers for AI training. ### Why sue Meta now? Because the evidence fight has moved. The complaint leans on allegations that Meta routed through piracy-linked sources like Anna’s Archive, which indexes repositories such as LibGen and Sci-Hub. That is a sharper accusation than “your model probably saw my work somewhere on the internet.” It tries to pin the case to deliberate acquisition of unauthorized copies, which is a much uglier fact pattern for Meta if the plaintiffs can prove it. (nature.com) ### What are they asking the court to do? Money, obviously, but not just money. The plaintiffs also want injunctive relief, including an order to destroy infringing copies in Meta’s possession or control. That is the part that makes these cases scary for AI companies. Damages hurt, but an order that reaches the training corpus or derivative model-building pipeline could force expensive retraining, relicensing, or both. (finance.yahoo.com) ### Is this just one more lawsuit? Yes and no. Yes, in the sense that Meta is already fighting AI copyright cases from authors and other rightsholders. But this one adds heavyweight publishers with deep catalogs and established licensing businesses. It also arrives after a similar publisher dispute involving Anthropic reportedly ended in a $1.5 billion settlement, which gives the whole sector a fresh benchmark for what these fights might be worth. (hachettebookgroup.com) ### So what is the real stakes question? The real question is whether courts will treat training on copyrighted text as a protected fair use or as a licensable act of copying. If publishers win on the core theory, the foundation-model business gets more expensive and more closed. If Meta wins, publishers lose leverage over one of the most valuable downstream uses of their catalogs. Either way, this is now bigger than books — it is about who gets paid when AI learns from the written record. (molawyersmedia.com) ### Bottom line? Elsevier did not just complain about AI in the abstract. It joined a class action that tries to turn AI training from a scraping habit into a billable, legally risky input. If that theory sticks, model builders will need a lot more permission — and a lot more cash. (hachettebookgroup.com) (nature.com)