Cloudflare Outage Caused by Config Change

A significant Cloudflare outage on February 20th, which disrupted 20% of internet traffic for six hours, was caused by a configuration change. A podcast analysis revealed a routine update to the "Bring Your Own IP" (BYOIP) pipeline triggered a logic error, deleting customer networks. The incident highlighted how multiple layers of automated and manual safeguards failed to prevent the catastrophic failure.

- The specific trigger was a bug in a script intended to automate the removal of customer IP prefixes; an API query with a bug was misinterpreted as a command to delete all prefixes rather than a specific subset. - The change, part of a resiliency initiative called "Code Orange: Fail Small," led to the withdrawal of Border Gateway Protocol (BGP) routes for approximately 1,100 of the 4,306 total BYOIP prefixes. - Recovery was complicated because the failure impacted prefixes differently; while some customers could restore service themselves via the dashboard, around 300 prefixes had their service configurations entirely removed and required manual restoration by engineers, which was completed at 23:03 UTC. - High-profile services affected by the outage included Uber, Workday, Wikipedia, Microsoft Outlook, and major betting platforms like Bet365. - End-user connections to affected services experienced a phenomenon known as "BGP Path Hunting," where their requests would search across networks for a valid route to the destination IP until the connection timed out. - This event follows a pattern of significant configuration-related incidents at Cloudflare, including a major November 2025 outage caused by a bloated bot management configuration file and a March 2025 failure triggered by a credential rotation error. - In addition to customer sites, Cloudflare's own public DNS resolver website, 1.1.1.1, displayed 403 errors, though the underlying DNS resolution service was not affected.

Cloudflare Outage Caused by Config Change

Get your own daily briefing