AI compute is being rationed

Companies are already rationing AI compute and raising prices as demand outstrips capacity. A report says outages, service limits and higher costs are appearing as compute and energy resources tighten, which can disrupt burst-heavy workloads like newsroom video spikes. (enterpriseai.economictimes.indiatimes.com)

Artificial intelligence companies are already capping usage, posting more outages and charging more as demand for computing power outruns supply. (enterpriseai.economictimes.indiatimes.com) The squeeze shows up in the basic mechanics of cloud access: Microsoft says Azure OpenAI quotas are set by region, subscription and model, and it has added seven quota tiers that raise limits over time instead of offering unlimited capacity on demand. (learn.microsoft.com) Google says customers who need Tensor Processing Units, its in-house artificial intelligence chips, now have to choose between on-demand access, spot capacity that can be interrupted, or reservations that lock in hardware for as long as a year. (docs.cloud.google.com) The bottleneck is not just chips. The International Energy Agency says data centres used about 415 terawatt-hours of electricity in 2024, roughly 1.5% of global power demand, and servers account for about 60% of electricity use inside modern facilities. (iea.org) That helps explain why providers are steering customers toward reserved capacity and quota systems. A bursty workload like generating newsroom video on a breaking-news day needs spare servers and spare power at the same moment, and both are getting harder to guarantee. (docs.cloud.google.com) (iea.org) The strain is visible on public status pages. Anthropic reported incidents on April 10, April 13 and April 14, 2026, including elevated errors on Claude requests, login failures on Claude.ai and degraded analytics application programming interface endpoints. (status.claude.com) Companies are also reshaping products around those limits. OpenAI says the Sora web app and mobile app will be discontinued on April 26, 2026, and the Sora application programming interface will shut down on September 24, 2026. (help.openai.com) At the same time, providers are racing to secure more hardware years in advance. Anthropic said in October 2025 that it planned to expand its Google Cloud footprint to as many as one million Tensor Processing Units, a build-out it said would bring well over a gigawatt of capacity online in 2026. (anthropic.com) The result is a cloud market that looks less like infinite software and more like airline seating: quotas, reservations, priority tiers and higher last-minute prices when demand spikes. (learn.microsoft.com) (docs.cloud.google.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.