Publishers sue Meta and Zuckerberg

- A coalition of publishers filed a copyright lawsuit against Meta and Mark Zuckerberg alleging large‑scale infringement tied to training AI models. (dexerto.com) - The complaint claims Zuckerberg personally “encouraged” infringement during the development of AI systems that used publisher content without authorization. (dexerto.com) - The suit adds to mounting legal pressure from rights holders over how generative AI is trained on creative works. (dexerto.com)

Publishers just opened a new front in the AI copyright war — and this one is aimed straight at Meta and Mark Zuckerberg personally. Five big publishing houses and novelist Scott Turow sued in federal court in Manhattan on May 5, saying Meta built its Llama models with millions of pirated books and journal articles. The core claim is simple: this was not sloppy scraping around the edges. The plaintiffs say Meta knowingly pulled from pirate libraries, copied the works at several stages of training, and then used the results to compete with the people and companies that made the originals. ### Who is suing, exactly? The plaintiffs are Elsevier, Cengage, Hachette, Macmillan, McGraw Hill, Scott Turow, and Turow’s company Scribe. They filed a proposed class action in the Southern District of New York, which means they want to represent a broader group of copyright owners whose registered books, journals, and articles were allegedly used the same way. That matters because this is not just a fight over a few headline authors. It is framed as a publishing-industry case about large-scale use of protected text. ### What are they accusing Meta of doing? Basically, three things. First, acquiring works from pirate sources like LibGen, Anna’s Archive, and Sci-Hub. Second, making unauthorized copies while ingesting and training Llama. Third, stripping copyright management information and flooding the market with AI outputs that can substitute for the originals. The complaint says Meta’s infringement happened at every step — getting the files, processing them, and turning them into a model that can summarize, imitate, and sometimes reproduce parts of protected works. ### Why is Zuckerberg named personally? That is the sharpest part of the case. The complaint does not just say Meta did this. It says Zuckerberg “personally authorized and actively encouraged” the infringement, including decisions around whether to license content or just take it. Naming a CEO is unusual because plaintiffs usually sue the company and stop there. Here, the publishers are trying to show this was a deliberate business choice from the top, not an engineering side effect. ### What makes this different from the earlier Meta cases? Meta was already fighting authors in the Kadrey case in California over Llama training. But that suit began in 2023 and centered on individual authors including Sarah Silverman, Richard Kadrey, and Christopher Golden. This new case comes from major publishers plus Turow, and it adds a more direct commercial angle — textbooks, journals, trade books, and reference publishing, not just novels. It also lands in New York federal court, not the Northern District of California. ### Why do publishers think they have a stronger hand? Because the allegations are not limited to “the model learned from my book.” They are also about source material that was allegedly pirated in bulk. That matters because fair use is the defense AI companies usually lean on, but fair use looks weaker if a court thinks the inputs were knowingly stolen from illegal repositories and copied repeatedly along the way. The publishers are trying to turn the case from an abstract training-data debate into a much blunter piracy case. That is the real strategic move here. ### What does Meta say? Meta had not publicly answered in court when the suit was filed. In related AI copyright fights, though, Meta has argued that training on text is lawful fair use. That is the broader line the company is likely to keep pressing here too. The catch is that this complaint is built to make that defense look less comfortable by focusing on torrenting, pirate libraries, and executive approval. ### Why does this matter beyond Meta? Because the case goes to the question hanging over the whole generative AI business: can companies train on copyrighted text first and sort out permission later? If publishers win traction on the piracy theory, the pressure on AI developers rises fast — not just to pay for licenses, but to prove exactly where their training data came from. That would hit costs, timelines, and maybe even which models can stay on the market without legal risk. ### Bottom line? This is not just another “AI used my work” complaint. It is a coordinated attempt by major publishers to say Meta built Llama with stolen inventory — and that Zuckerberg signed off on the plan. If that framing sticks, the fight over AI training stops looking like a fuzzy copyright edge case and starts looking like old-fashioned piracy at industrial scale.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.