AI compute is constrained

Reports show AI compute capacity is tight: GPU prices are rising, outages and rationing are appearing, and firms are hunting for localized or sovereign options. The squeeze on raw compute is being framed across several industry reports as a constraint for AI projects that require large-scale GPU access (the-decoder.com).

Artificial intelligence projects are running into a simpler problem than model design: there are not enough chips, power hookups, and data center slots to go around. (delloro.com) The bottleneck starts with the hardware. Training and serving large models requires thousands of specialized processors working in parallel, and Google says its Tensor Processing Units and Nvidia-style accelerators are built for exactly those large-scale training and inference jobs. (cloud.google.com) Demand is still climbing faster than the infrastructure underneath it. Dell’Oro Group said on March 12 that hyperscalers scaled artificial intelligence infrastructure rapidly in 2025, lifting demand not just for accelerators but also for high-bandwidth memory, networking chips, storage, and other server parts. (delloro.com) The shortage is no longer just about chips on a loading dock. Data Center Frontier, citing Dell’Oro research, reported data center capital spending rose 59 percent year over year in the third quarter of 2025, while liquid cooling grew 85 percent as operators rebuilt facilities for denser artificial intelligence clusters. (datacenterfrontier.com) Cloud providers are acknowledging the squeeze in public. Microsoft said in its 2025 annual report that it is still expanding data center locations and server capacity to meet rising demand for artificial intelligence services, after Azure revenue topped $75 billion for the year. (microsoft.com) Amazon is being more explicit about unmet demand. In his 2025 shareholder letter, quoted by Network World on April 10, Chief Executive Officer Andy Jassy said Amazon Web Services added 3.9 gigawatts of power capacity in 2025 and still had “capacity constraints” that left demand unserved. (networkworld.com) Jassy said two large customers asked to buy all available 2026 capacity for Graviton instances, Amazon Web Services’ custom central processing unit line, and Amazon declined because it had to spread supply across other customers. (networkworld.com) The response is to diversify away from a single chip or a single cloud. Anthropic said last week that it trains and runs Claude on Amazon Trainium, Google Tensor Processing Units, and Nvidia graphics processing units so it can match workloads to different hardware and improve resilience. (anthropic.com) That same diversification is pushing the market toward “sovereign” setups, where governments and regulated companies want artificial intelligence systems and data to stay under local control. Oracle says its sovereign artificial intelligence offerings let customers choose public cloud regions, dedicated regions in their own data centers, or isolated environments with tighter control over encryption, access, and operations. (oracle.com) Google is making a similar case with custom chips built for bigger inference loads, the stage where a trained model answers users instead of learning from data. Google said its Ironwood Tensor Processing Unit, introduced on April 23, 2025, scales to 9,216 chips and was designed specifically for inference. (blog.google) The immediate result is that artificial intelligence capacity is being treated less like ordinary cloud computing and more like scarce industrial equipment. Companies can still buy models and software, but the projects that need large clusters now depend on who can secure the chips, electricity, cooling, and local control first. (delloro.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.