Publishers sue Meta over Llama
- Elsevier, Cengage, Hachette, Macmillan, McGraw Hill, and novelist Scott Turow sued Meta and Mark Zuckerberg on May 5 in Manhattan federal court. - The complaint says Meta knowingly used pirated “shadow libraries” to train Llama, then stripped copyright-management data and reproduced passages and styles. - The case raises the cost of sloppy training-data provenance just as Meta scales AI infrastructure with a roughly $13 billion El Paso buildout.
Books are the object here. But the real fight is over how foundation models get built, and whether “everyone does it” is going to hold up in court. On May 5, five major publishers plus novelist Scott Turow sued Meta and Mark Zuckerberg in Manhattan federal court, saying Llama was trained on millions of copyrighted works copied without permission. (publishers.org) ### Who is suing whom? The plaintiffs are Elsevier, Cengage, Hachette Book Group, Macmillan, McGraw Hill, and Scott Turow. They filed a putative class action against Meta and Zuckerberg personally, which matters because the complaint does not just say Meta made a bad call — it says Zuckerberg was directly involved in approving it. (publishers.org) ### What are they actually accusing Meta of? Basically, the suit says Meta built parts of Llama with unauthorized copies of books, textbooks, and journal articles gathered from across the internet, including pirate sources. The publishers also say Meta removed copyright-management information from those works, which is a separate legal problem because that metadata helps identify ownership and licensing terms. (publishers.org) ### Why does Zuckerberg show up as a defendant? That is one of the sharper edges in the case. The complaint says Zuckerberg “personally authorized and actively encouraged” the infringement and approved using pirated collections instead of going through normal licensing channels. That does not prove liability by itself, but it raises the stakes because it turns this from an abstract company-process dispute into a leadership decision. (cbsnews.com) ### Why are pirate libraries such a big deal? Because intent matters. A lot of AI copyright cases revolve around broad scraping and fair use. This one tries to move the argument from “the model learned from public text” to “Meta knowingly took from illegal repositories.” That is a much uglier fact pattern for a defendant, especially if internal decisions show the co(cbsnews.com)ference from the allegations, not a court finding. (publishers.org) ### Do the plaintiffs say Llama copies the books back out? Yes — and that is another reason this case could matter. The complaint says Llama can generate summaries, mimic authors’ styles, and in some cases reproduce passages from novels, textbooks, and journal articles. In these lawsuits, output evidence is useful because courts tend to care more when training is tied to market harm or near-verbatim reproduction. (cbsnews.com) ### What is Meta’s defense likely to be? Meta has already said it will fight the case aggressively and has pointed to the broader argument that AI training on copyrighted material can qualify as fair use. That is the core defense across the industry — models transform inputs into statistical systems rather than storing books like a normal library. But that defense g(cbsnews.com)p data. (cbsnews.com) ### Why does this land now? Because Meta is not slowing down on AI spending. At the same time this lawsuit arrived, Meta was working with Morgan Stanley and JPMorgan on roughly $13 billion in financing for a data center in El Paso, with most of the package expected to be debt and the site aiming for 1 gigawatt ahead of a projected 2028 opening. The company is scaling compute fast — and the legal bill for data provenance may now scale with it. (money.usnews.com) ### Bottom line This is not just another “authors versus AI” complaint. It is a test of whether courts treat training-data shortcuts as normal internet messiness or as old-fashioned piracy with better branding. If the publishers can prove knowing use of pirated corpora, the whole industry’s favorite defense starts to look a lot thinner. (publishers.org)