Rights clash: YouTubers sue Apple
A group of YouTubers in France has filed a class action alleging Apple used their videos to train its AI systems without permission, highlighting a live dispute over content provenance and training rights. That kind of litigation—whether it succeeds or not—makes clean ownership of training data and likeness rights a growing diligence item for any studio packaging digital validation. (siecledigital.fr)
Three YouTube creators have filed a proposed class action in federal court in California accusing Apple of using their videos to train artificial intelligence systems without permission, and the complaint says one creator’s material appeared more than 500 times in the disputed dataset. (siecledigital.fr) (9to5mac.com) The names attached to the case are h3h3Productions, MrShortGame Golf, and Golfholics, and the suit says they want to represent other creators whose videos were allegedly pulled into Apple’s training pipeline the same way. (tech.yahoo.com) (creatorhandbook.net) The argument is not just “Apple watched YouTube.” The complaint says Apple used a dataset called Panda-70M, which works like a giant spreadsheet of YouTube links, timestamps, and clip markers that point to specific slices of specific videos. (9to5mac.com) (arxiv.org) Panda-70M is huge: its public project page says it starts from 3.8 million long videos and turns them into about 70.8 million short clips paired with captions, which is exactly the kind of raw material video generators use to learn what motion and language look like together. (snap-research.github.io) (arxiv.org) Apple’s name is in this because a late-2024 Apple research paper on video generation said its model was trained with Panda-70M, so the plaintiffs are using Apple’s own research trail as part of the map back to the alleged scraping. (9to5mac.com) (pcmag.com) This fight has been building since July 2024, when Proof News reported that subtitles from 173,536 YouTube videos across more than 48,000 channels had been used by companies including Apple, Nvidia, Anthropic, and Salesforce. (proofnews.org 1) (proofnews.org 2) That earlier reporting mattered because it turned a vague creator fear into something searchable. Proof News published a lookup tool tied to the YouTube Subtitles dataset, and major channels like MrBeast, Marques Brownlee, and PewDiePie were identified in the scraped material. (proofnews.org 1) (proofnews.org 2) The legal hook here is sharper than a normal copyright complaint. Several reports say the plaintiffs are leaning on the Digital Millennium Copyright Act, a United States law that can punish bypassing technical barriers, because that can be easier to argue than the still-unsettled question of whether training on scraped media is automatically fair use. (appleinsider.com) (winbuzzer.com) Apple has also said, in its official material on Apple Intelligence, that it does not use users’ private personal data or user interactions to train its foundation models. That statement does not answer the lawsuit’s separate question, which is whether public creator content from YouTube was gathered and used upstream in research or model training. (machinelearning.apple.com) That is why this case reaches past three channels and one company. If courts start treating training data like a supply chain, then every studio, label, and platform will need receipts showing who owned the footage, who licensed the voice, and who said yes to the machine learning run. (proofnews.org) (siecledigital.fr)