Cloudflare Outage Caused by Config Change
A significant Cloudflare outage on February 20th, which disrupted 20% of internet traffic for six hours, was caused by a configuration change. A podcast analysis revealed a routine update to the "Bring Your Own IP" (BYOIP) pipeline triggered a logic error, deleting customer networks. The incident highlighted how multiple layers of automated and manual safeguards failed to prevent the catastrophic failure.
- The specific trigger was a bug in a script intended to automate the removal of customer IP prefixes; an API query with a bug was misinterpreted as a command to delete all prefixes rather than a specific subset. - The change, part of a resiliency initiative called "Code Orange: Fail Small," led to the withdrawal of Border Gateway Protocol (BGP) routes for approximately 1,100 of the 4,306 total BYOIP prefixes. - Recovery was complicated because the failure impacted prefixes differently; while some customers could restore service themselves via the dashboard, around 300 prefixes had their service configurations entirely removed and required manual restoration by engineers, which was completed at 23:03 UTC. - High-profile services affected by the outage included Uber, Workday, Wikipedia, Microsoft Outlook, and major betting platforms like Bet365. - End-user connections to affected services experienced a phenomenon known as "BGP Path Hunting," where their requests would search across networks for a valid route to the destination IP until the connection timed out. - This event follows a pattern of significant configuration-related incidents at Cloudflare, including a major November 2025 outage caused by a bloated bot management configuration file and a March 2025 failure triggered by a credential rotation error. - In addition to customer sites, Cloudflare's own public DNS resolver website, 1.1.1.1, displayed 403 errors, though the underlying DNS resolution service was not affected.