Publishers cite LLM scraping

Publishers say large language models are scraping their sites repeatedly and draining ad and subscription revenue, prompting interest in technical and licensing responses. The article highlights services like Cloudflare and TollBit and suggests publishers are looking for publisher‑aligned protections against unlicensed AI use. (securityboulevard.com)

Publishers are moving to block or charge artificial intelligence crawlers as news sites say repeated scraping is cutting traffic, ad sales, and subscriptions. (cloudflare.com) Cloudflare said on July 1, 2025 that it would block artificial intelligence crawlers by default for new customers, unless a site owner gives permission. The company said site owners can also set terms based on whether a crawler is used for training, inference, or search. (cloudflare.com) TollBit sells a different response: software that identifies artificial intelligence bot traffic, separates it from human traffic, and lets publishers set rules and prices for access. TollBit says publishers can offer licensed retrieval, paywalls for bots and agents, and agent-specific versions of their sites. (tollbit.com) The fight centers on a web bargain that shaped publishing for decades: search engines copied pages, then sent readers back to the source. Cloudflare said that model is breaking because artificial intelligence systems can scrape text and generate answers without sending visitors to the original site. (cloudflare.com) Cloudflare’s own traffic data showed the imbalance in September 2025. It said training accounted for nearly 80% of artificial intelligence crawling by mid-2025, while some bots still sent vanishingly small numbers of visitors back to publishers. (blog.cloudflare.com) Media companies have already taken the dispute to court. Dow Jones and the New York Post sued Perplexity on October 21, 2024, alleging a “massive amount of illegal copying” of copyrighted work. (cnbc.com) Perplexity said after that lawsuit that “AI-enhanced search engines are not going away” and cast the case as resistance from incumbent media companies. The company said it respects publishers and has pursued publisher programs, but rejected the broader attack on its model. (variety.com) Not every publisher is choosing lawsuits or blanket blocking. OpenAI has signed licensing deals with publishers including Axel Springer, and industry tracking shows other large publishers have also pursued direct agreements instead of litigation. (openai.com, amediaoperator.com) The new tools are spreading through publisher infrastructure. Arc XP, a publishing platform owned by The Washington Post, said on March 23, 2026 that it had integrated TollBit so customers could detect bots in real time, block them, or route them to paid access terms. (prnewswire.com) That leaves publishers with three main options in 2026: block crawlers, sign licenses, or build systems that meter artificial intelligence access page by page. The common goal is the same one publishers say they are losing now: getting paid when machines read their work. (cloudflare.com, tollbit.com, prnewswire.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.