CPU capacity crunch

Industry reports say Amazon tripled its CPU server footprint and still ran short because agentic AI workloads are consuming available general‑purpose processors in the cloud. (wccftech.com) Observers argue that if agentic systems drive CPU demand as well as accelerator demand, teams will need more deliberate workload partitioning between edge, site, and central cloud resources. (x.com)

The cloud’s newest Artificial Intelligence bottleneck is not the graphics chip. It is the central processor that handles the step-by-step work around each model run. (aws.amazon.com) A central processing unit, or CPU, is the general-purpose chip in a server. Amazon Web Services said on March 30 that many production Artificial Intelligence workloads still run cost-effectively on CPUs, and that inference is expected to make up two-thirds of all Artificial Intelligence compute by 2026. (aws.amazon.com) Agentic Artificial Intelligence systems use those chips heavily because they do more than generate text. Quartz reported on March 11 that agents plan tasks, call application programming interfaces, query databases, run code, and check results before trying again. (qz.com) That shift is showing up in cloud demand. Amazon chief executive Andy Jassy said in the company’s 2025 shareholder letter, published April 9, that Amazon Web Services added 3.9 gigawatts of power capacity in 2025 and still had “unserved demand.” (networkworld.com) Jassy said two large customers asked to buy all available 2026 capacity for Graviton, Amazon Web Services’ custom central processing unit. Amazon’s investor relations site lists that 2025 shareholder letter alongside the 2025 annual report. (networkworld.com) (ir.aboutamazon.com) Amazon has also been pushing deeper into agent software and its own server silicon. At re:Invent in December 2025, Amazon Web Services introduced Bedrock AgentCore for agents and Graviton5, which Amazon called its most powerful central processing unit. (aboutamazon.com) Amazon is not the only company seeing the squeeze. Reuters reported in February, via TrendForce and other outlets, that Intel warned some Chinese customers of server central processing unit lead times of up to six months, while Advanced Micro Devices stretched some orders to eight to 10 weeks. (trendforce.com) Futurum said on February 24 that agentic and reinforcement-learning workloads are pushing central-processor-to-graphics-processor ratios in Artificial Intelligence clusters back toward 1:1. In plain terms, companies now need more “traffic cops” in the rack, not just more engines. (futurumgroup.com) Amazon’s own customer research points the same way. An International Data Corporation study published by Amazon Web Services said more than 900 organizations were surveyed, 50 percent reported 10 or more agents in production in 2025, and 37 percent were pursuing multi-agent systems. (aws.amazon.com) That is why the CPU story has moved from background detail to capacity problem. If agents keep spreading from pilots to production, cloud providers will have to add more general-purpose servers, not just more accelerator clusters. (aws.amazon.com) (futurumgroup.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.