Agent workloads strain CPUs

Reports indicate Amazon dramatically expanded its CPU server fleet and still faced shortages as agentic AI workloads consumed general‑purpose processors across the cloud. The account suggests agent systems with many tool calls, validators and orchestration steps create heavy non‑GPU infrastructure demand, which calls for designs that use caching, bounded loops and selective tool invocation. (wccftech.com)

An artificial intelligence agent is not one model call but a chain of steps, and those extra steps are pushing central processors into short supply. (arxiv.org, docs.aws.amazon.com) A recent report, citing SemiAnalysis chief analyst Dylan Patel, said Amazon tripled its central processor server count year over year and still ran short as agent systems spread across cloud workloads. (wccftech.com, newsletter.semianalysis.com) In these systems, the large language model plans work, calls tools or application programming interfaces, fetches data, and then runs checks before returning an answer. Amazon Web Services shows that pattern with preprocessing, orchestration, inference, and post-processing layers built around Amazon Bedrock agents and AWS Lambda. (docs.aws.amazon.com) That changes which chip does the heavy lifting. A November 2025 paper found tool processing on central processors can account for as much as 90.6 percent of total latency in representative agent workloads. (arxiv.org) The same paper found central processors can consume up to 44 percent of total dynamic energy at large batch sizes, even in systems built around graphics processors. The authors profiled five agent workloads, including Haystack retrieval-augmented generation, Toolformer, ChemCrow, LangChain, and SWE-Agent. (arxiv.org) Cloud vendors have been building for that shift. Amazon Web Services said in 2025 that it was investing another $100 million in its Generative Artificial Intelligence Innovation Center and launching Amazon Bedrock AgentCore to help customers run agents at enterprise scale. (aboutamazon.com) Amazon’s own reference architecture for scalable agentic artificial intelligence on Amazon Elastic Kubernetes Service splits work across AWS Graviton central processors for core services and workflows, with NVIDIA graphics processors and AWS Inferentia chips handling accelerated inference. (aws-solutions-library-samples.github.io) Chip companies are now pitching that balance directly. Advanced Micro Devices said on March 13, 2026, that agentic artificial intelligence raises demand for central processors because they schedule work, move data, manage memory, and keep accelerators busy. (amd.com) Researchers are already outlining ways to cut that load. The November 2025 paper proposed central processor and graphics processor-aware micro-batching and mixed workload scheduling, while the underlying pattern also favors caching repeated results, limiting loops, and calling tools only when needed. (arxiv.org, docs.aws.amazon.com) The first wave of generative artificial intelligence made graphics processors the scarce resource. The next wave is showing that every extra tool call, validator, and orchestration step also needs a central processor somewhere in the rack. (wccftech.com, arxiv.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.