feature

Resilience Gets Its Budget Line

# Resilience Gets Its Budget Line On October 20th, 2025, AWS's US-EAST-1 region failed for 15 hours. Slack, Atlassian, HMRC, Barclays, Lloyds, and Bank of Scotland went down with it. Downdetector recorded 17 million user reports from 60 countries—a 970% spike from normal. Nine days later, a configuration change in Azure Front Door triggered a 9-hour outage with an estimated financial impact between $4.8 billion and $16 billion. Between August 2024 and August 2025, AWS, Azure, and Google Cloud together experienced more than 100 service outages—the very provider outages that AI wrapper abstractions must handle, as we examined yesterday. The one-two punch changed how enterprises budget for infrastructure risk. InfoWorld captured the shift: CIOs and CFOs are abandoning the idea that resilience gets whatever's left after cost optimization. Resilience now has explicit line items—multiregion architectures, modernized backup systems, cross-cloud continuity strategies. The October outages gave resilience advocates concrete ammunition: lost transactions, missed SLAs, remediation overtime, reputational damage with dollar signs attached. This comes amid ongoing tension between CFOs and CIOs over technology spending—CFOs cautious about ROI, CIOs pushing for broader modernization. When banks can't process payments because a DNS configuration went sideways in Virginia, the conversation shifts from "can we afford redundancy?" to "can we afford not to have it?" More disruption is coming. Forrester predicts AI data center upgrades will trigger two more major multiday cloud outages this year. SC Media warns that attackers and outages are converging in cloud ecosystems. Radware recently disclosed "ZombieAgent," a zero-click AI agent vulnerability enabling silent takeover and cloud-based data exfiltration. "An AWS disruption last week, Microsoft Azure this week, and I have no doubt another Fortune 100 will be hit next week," Catchpoint CEO Mehdi Daoudi observed in October. "Resilience gaps are still widespread across even the most advanced infrastructures." The harder problem isn't budgeting—it's knowing what to buy. Traditional disaster recovery assumed you controlled your infrastructure. Modern enterprises don't have data centers; they have dependency graphs. With Kubernetes at 62% market share and Docker at 44%—the internal platforms around which gave early adopters years of operational advantage, as we noted yesterday—among DevOps teams, container orchestration has become the substrate on which everything runs. When Atlassian Cloud went down during the AWS outage, it affected companies that had never intentionally chosen AWS. Many businesses that say "we don't use AWS" still felt the impact. Gartner VP Analyst Ron Blair argued that traditional disaster recovery doesn't fit the SaaS-first world: "When a SaaS outage occurs, resilience depends not on faster failover, but on clear processes, tested workarounds, and business-led decisions." Only 18% of organizations describe their disaster recovery posture as advanced. Most don't know their dependencies—not just which clouds they use directly, but which clouds and regions sit beneath their SaaS, security, and operations tools. What took down many organizations in October wasn't their primary infrastructure but the supporting mechanisms: monitoring systems behind Cloudflare, containers that couldn't be pulled from Docker Hub. The organizations that responded well had done unglamorous work in advance. As one Mexico-based engineer described: "We rely on multiple clouds, including AWS, Google Cloud, and Heroku. During the recent AWS outage, we temporarily switched to Google Cloud. If all cloud providers are down, we fall back to using an on-premise server. It doesn't perform as efficiently, but it comes in handy." That's the shape of resilience spending in 2026. Not glamorous AI projects—like yesterday's platforms to conquer the wrapper tax—but multicloud failover, tested runbooks, manual workarounds documented and rehearsed. For a decade, hyperscalers sold enterprises on the idea that someone else's infrastructure would always be more reliable than their own. That remains largely true—no enterprise IT shop matches the engineering depth of AWS or Azure. But no provider guarantees zero downtime. The organizations that treat resilience as a first-class budget line will weather the next outage. The ones still treating it as discretionary will learn October's lesson at their own expense.

Resilience Gets Its Budget Line

Get your own daily briefing