Compute Is Becoming Heterogeneous
- Arm and Google Cloud announced Axion processors and updates to operationalize agentic AI infrastructure at scale. - Industry voices note a new era of specialized chips tuned for agentic and orchestration-heavy workloads. - Startups should expect infrastructure optimised for long-running agents and orchestration, not just one-shot inference ( ).
An AI agent is less like a single chatbot reply and more like a software worker that keeps calling tools, checking results, and trying again. That is pushing cloud companies to build systems around mixed fleets of chips instead of treating the graphics processor as the whole machine. (newsroom.arm.com) Arm said this week that Google Cloud’s latest agent-focused stack pairs Axion central processors with new eighth-generation Tensor Processing Unit systems and a Google Kubernetes Engine service called Agent Sandbox. Arm said the setup is aimed at “complex, multi-step” workloads that need secure code execution and fast orchestration. (newsroom.arm.com) In Arm’s description, the new Google Kubernetes Engine sandbox can launch up to 300 sandboxes per second per cluster with time-to-first-instruction under one second on Axion-based infrastructure. Google’s AI Infrastructure page now markets the broader stack as “agent-native infrastructure” built to train, serve, and operate agents. (newsroom.arm.com) (cloud.google.com) The technical shift is straightforward: one chip handles the heavy math of model training or inference, while another handles the plumbing around it. Google said agentic workflows need “tight coordination between general-purpose compute and ML acceleration,” and that is creating demand for custom silicon and co-designed systems. (cloud.google.com) That makes the central processor newly important in AI systems that spend time scheduling jobs, moving data, running application servers, and isolating tool calls. Arm said those workloads put the CPU “firmly on the critical path,” even when the model itself still runs on a Tensor Processing Unit or graphics processor. (newsroom.arm.com) Google has been laying the pieces for that split-compute model for more than a year. It introduced Axion in 2024 as a custom Arm-based processor for general-purpose compute and AI inference, saying the chips could deliver up to 60% better energy efficiency and up to 50% more performance than comparable current-generation x86 instances. (newsroom.arm.com) Since then, Google has expanded Axion into multiple product lines. Its Axion product page says C4A instances offer up to 10% better performance per virtual central processor than the latest Arm-based cloud alternatives, while N4A virtual machines, now generally available, offer up to 2x better price-performance and 80% better performance-per-watt than comparable current-generation x86 virtual machines. (cloud.google.com 1) (cloud.google.com 2) Google used its April 22, 2026 Next conference keynote to tie those infrastructure products directly to “the Agentic Enterprise.” Chief Executive Officer Thomas Kurian said Ironwood Tensor Processing Units and Axion processors are now generally available, and said nearly 75% of Google Cloud customers are using the company’s artificial intelligence products. (cloud.google.com) The wider point for startups is that buying “AI compute” no longer means buying only the biggest accelerator they can afford. The cloud vendors are selling a stack in which different chips do different jobs, and the jobs growing fastest are the long-running orchestration tasks around the model, not just the one-shot answer at the end. (cloud.google.com) (newsroom.arm.com)