News/Media Alliance demands Common Crawl ban

- News/Media Alliance sent Common Crawl a formal April 29 letter demanding publisher removals, AI-use bans, and clearer warnings that scraped news content is unlicensed. (newsmediaalliance.org) - The letter says some publishers sought removals more than 2.5 years ago, and that over 60% of 2024 donated funds came from AI-linked backers. (newsmediaalliance.org) - The fight targets a core AI data pipeline — if Common Crawl tightens access, publisher licensing leverage rises fast. (newsmediaalliance.org)

The fight here is about web crawl data — the raw material underneath a lot of modern AI models. Publishers have spent two years suing model makers and negotiating licensing deals, (newsmediaalliance.org)uch of the public web and made those copies easy to download. Now the News/Media Alliance is trying to close that back door by pressuring Common Crawl itself. (newsmediaalliance.org) ### What happened? On April 29, the News/Media Alliance sent a letter to Common Crawl executive director Rich Skrenta demanding a set of con(newsmediaalliance.org)e publicly that it does not own the scraped material, explicitly ban unauthorized AI uses in its terms, and add publisher licensing contact info to its opt-out registry. The alliance published its account of the letter on April 30. (newsmediaalliance.org) ### Why go after Common Crawl? Because Common Crawl is not just another scraper. It is a shared archive that downs(newsmediaalliance.org)ibed it as one of the most important sources of training data in generative AI, especially because it gives smaller labs and startups access to web-scale text without building their own crawl from scratch. That makes it infrastructure, not just a website. (mozillafoundation.org) ### What are publishers saying is broken? The core complaint is simple — publishers say opting (newsmediaalliance.org)sly scraped material to be removed more than 2.5 years ago and still have not gotten resolution. It also argues that even if future crawling stops, old copies sitting in the archive can still be reused by AI developers, which defeats the point of a forward-looking opt-out. (newsmediaalliance.org) ### Why does the funding issue matter? Because the alliance (mozillafoundation.org)inted to Common Crawl’s 2024 finances and argued that more than 60% of donated funds — and at least 8 of 13 donors — were tied directly or indirectly to AI companies or data brokers. More than half of those donations allegedly came from Anthropic, OpenAI, and the Schmidt Foundation. The argument is not just legal. It is about incentives. (ppc.land) ### Is this really ab(newsmediaalliance.org)ny over whether Common Crawl archives included articles from publishers that normally put their journalism behind paywalls or registration walls. Common Crawl has pushed back, saying it does not go behind paywalls, honors robots.txt, and handles removal requests, but publishers clearly do not think that answer is enough. (commoncrawl.org) ### Why not just sue AI companies instea(ppc.land) is a bit like challenging the wholesale supplier instead of every retailer one by one. If Common Crawl has to make removals easier, post stronger legal warnings, or restrict AI use more directly, that could raise costs for model builders and strengthen the market for paid licensing deals with publishers. (newsmediaalliance.org) ### What happens next? That depends on whether Common Crawl changes poli(commoncrawl.org)ing Common Crawl as a neutral research project sitting outside the AI copyright wars. They are treating it as part of the commercial pipeline now. (newsmediaalliance.org) ### Bottom line? This is a quiet but important escalation. The AI copyright fight is moving down the stack — from chatbots and model outputs to the plumbing that fed them in the first place. If publishers can(newsmediaalliance.org)retending the data question is somebody else’s problem. (newsmediaalliance.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.