Publishers alarmed by third-party scrapers
A creator-industry report warns that third-party web scrapers are harvesting publisher content to feed AI systems without direct publisher relationships, creating a new distribution and attribution risk. That trend raises questions about how publishers and studios protect context and value as AI ingests more of the open web. (spicycreatortips.com)
A publisher can block OpenAI’s bot, cut a licensing deal with Google, and still end up inside an artificial intelligence answer box because a separate scraper copied the page first. That is the hole publishers say is getting bigger in 2026. (pressgazette.co.uk) The new worry is not just the big artificial intelligence companies people already know. It is a layer of specialist scraping firms that collect articles, reformat them for machine use, and sell that access onward to artificial intelligence developers and big corporate buyers. (tollbit.com) TollBit said in its 2025 year-end report that it documented nearly 40 scraping vendors in this market. The company said many of them advertise tools for evading cybersecurity defenses and mimicking human visitors, which makes them harder to spot in normal site logs. (tollbit.com) That changes the old fight between publishers and platforms. A newspaper used to know which company was crawling its pages, but now the buyer and the scraper can be two different businesses, so the publisher may see the break-in without seeing the customer. (pressgazette.co.uk) The paywall is not a clean fence either. TollBit said some scrapers in its testing could retrieve full versions of paywalled articles, and Press Gazette reported that publishers fear copied versions on other sites or archives can also become a back door into subscription content. (tollbit.com) (pressgazette.co.uk) The traffic numbers explain why publishers are panicking. TollBit said the average site on its network went from 1 artificial intelligence bot visit for every 200 human visits in early 2025 to 1 bot visit for every 31 human visits by the end of the year. (tollbit.com) The referral trade is collapsing at the same time. Akamai said artificial intelligence chatbots drove about 96% less referral traffic than traditional Google Search in the fourth quarter of 2024, which means publishers are giving up content and server capacity while getting far fewer readers back. (intelligentciso.com) Even direct licensing deals are not fixing that. TollBit said sites with one-to-one artificial intelligence licensing agreements saw click-through rates fall from 8.8% in the first quarter of 2025 to 1.33% by year end, while sites without such deals fell from 0.8% in the second quarter to 0.27% by year end. (tollbit.com) That is why infrastructure companies are moving from simple blocking to toll booths. On July 1, 2025, Cloudflare introduced “pay per crawl,” which lets a site allow, block, or charge an artificial intelligence crawler using standard web responses including the old 402 Payment Required code. (blog.cloudflare.com) Cloudflare’s pitch is that most publishers do not have the leverage to negotiate dozens of private deals with every model maker and every scraper in the chain. A network-level payment rule tries to turn that mess into one switch at the door, even when the crawler has no billing relationship yet. (blog.cloudflare.com) The legal pressure is moving down the supply chain too. Google sued the scraping company SerpApi in December 2025, saying the company bypassed protections around search results, and Reddit sued Anthropic in June 2025 over alleged scraping of Reddit content for model training. (blog.google) (pbs.org) The fight now is less about one chatbot quoting one article than about who controls the pipes between the open web and machine answers. If third-party scrapers become the default wholesalers for artificial intelligence systems, publishers lose not just payment but the ability to decide which version of their work gets copied, cited, stripped of context, or sold. (tollbit.com) (pressgazette.co.uk)