Copyright suits hit 100+
U.S. copyright litigation over AI training data has surpassed 100 cases, as creators increasingly pair lawsuits with licensing talks to try to secure compensation or controls (noah-news.com). In a recent class-action, three YouTube creators sued Apple alleging their videos were scraped without consent for model training — a concrete signal that data provenance and auditable usage are becoming legal, not just ethical, requirements ( ).
The fight over artificial intelligence training data has moved from a handful of test cases to a full court docket: by 2026, U.S. lawsuits over how models were trained had climbed past 100, with separate litigation tracks now running against OpenAI, Google, Meta, Anthropic, Nvidia, ByteDance, and others. (courtlistener.com 1) (courtlistener.com 2) (news.bloomberglaw.com) The new wrinkle is not just “you copied my work.” It is “show me exactly where you got it, how you got it, and whether you bypassed any locks to get it,” which is why so many complaints now focus on scraping methods, pirate libraries, and platform protections instead of only the model’s output. (news.bloomberglaw.com 1) (news.bloomberglaw.com 2) That is what makes the new Apple case useful as a map of where this is headed. Three YouTube creators sued in federal court in California after Apple researchers described using a dataset called Panda-70M, which the complaint says indexed millions of YouTube clips for video-model training. (9to5mac.com) The plaintiffs are Ted Entertainment, Matt Fisher, and Golfholics, and they say their videos appear more than 500 times in that dataset. Their complaint says a single YouTube upload can be chopped into many training clips, so one original video can turn into dozens of separate data points inside a model pipeline. (9to5mac.com) They are leaning on the Digital Millennium Copyright Act, a 1998 law that bans bypassing technical protection measures, because that route does not depend on every creator having a registered copyright. Bloomberg Law noted the same strategy in near-identical YouTube scraping suits against Meta and ByteDance, where the creators argued that YouTube’s access restrictions were evaded to pull videos into training systems. (news.bloomberglaw.com) That shift matters because a training-data case used to sound abstract, like arguing over what a model “learned” from billions of tokens. A Digital Millennium Copyright Act case is more like arguing over whether someone picked the lock on a warehouse door before hauling the boxes away. (news.bloomberglaw.com) (9to5mac.com) Courts have already started drawing a line between what the model does and where the data came from. In the Anthropic book case, the dispute over pirated copies ended in a $1.5 billion settlement in 2025, and later reporting described that result as a push toward licensed inputs rather than “find it online and ask forgiveness later.” (cbsnews.com) (news.bloomberglaw.com) That is why creators are now suing and negotiating at the same time. Bloomberg Law reported that new licensing startups are pitching themselves as toll booths between artificial intelligence companies and rights holders, promising cleaner records, payment flows, and opt-in controls before the next lawsuit lands. (news.bloomberglaw.com) The other front is upstream. Publishers and record labels have started suing pirate repositories such as Anna’s Archive, arguing that if shadow libraries sell bulk access to 63 million books and 95 million papers, they are not just piracy sites anymore but wholesale suppliers for model training. (news.bloomberglaw.com) So the question is getting narrower and tougher at the same time. Not “did the internet influence the model,” but “which file, from which source, under which permission, with which audit trail,” and companies that cannot answer those four questions are the ones most likely to keep meeting creators in court. (news.bloomberglaw.com) (9to5mac.com)