OpenAI expands web crawl 2.9x
- Botify and Nectiv said April 23 that OpenAI’s web crawling roughly tripled after GPT-5, based on 7 billion bot events from November 2024 to March 2026. - The sharpest jump was in search-related crawling: OAI-SearchBot rose 3.5 times and GPTBot, OpenAI’s training crawler, increased about 2.9 times. - OpenAI lets sites allow search crawling while blocking training use, a split that now carries more weight. (developers.openai.com)
Botify and Nectiv said on April 23 that OpenAI’s web crawl roughly tripled after GPT-5, based on analysis of 7 billion bot events. (botify.com) The dataset covered November 2024 through March 2026 and was drawn from Botify’s enterprise log files across publishing, retail, travel, software, healthcare, and marketplaces. (botify.com) (ppc.land) The report separated OpenAI traffic into three bots: GPTBot for model training, OAI-SearchBot for ChatGPT search results, and ChatGPT-User for user-triggered page fetches. (developers.openai.com) (botify.com) The biggest acceleration came after GPT-5 launched in August 2025, which the analysis called a turning point in how often OpenAI’s automated bots hit sites. (ppc.land) (botify.com) Within that shift, OAI-SearchBot increased 3.5 times and GPTBot increased about 2.9 times, according to the study. ChatGPT-User events, by contrast, fell 28% from December 2025. (ppc.land) That split matters because OpenAI gives site owners separate controls for search and training. A publisher can allow OAI-SearchBot to appear in ChatGPT search answers while blocking GPTBot from training use. (developers.openai.com) OpenAI’s crawler documentation also says that if a site allows both bots, the company may reuse one crawl for both purposes to avoid duplicate fetching. The company says search-related robots.txt changes can take about 24 hours to apply. (developers.openai.com) For publishers and search marketers, the report points to a larger share of OpenAI traffic coming from answer-generation and search retrieval, not just from model-building crawls. That changes which pages get fetched, when they get refreshed, and how often logs show OpenAI activity. (botify.com) (ppc.land) The headline number is not an OpenAI earnings filing or a company disclosure. It comes from third-party log analysis, but it is one of the clearest public looks yet at how OpenAI’s bots are dividing search, training, and user-driven requests. (botify.com) (developers.openai.com)