Google Cloud hardware push

- Google Cloud Next showcased TPU v8 hardware updates and Gemini Agent improvements aimed at enterprise use. (youtube.com) - The TPU v8 architecture was split into chips optimised separately for training and low‑latency inference for agents. (youtube.com) - The changes indicate cloud providers are tuning infrastructure specifically for real‑time, production agent workloads. (youtube.com)

Google used its Cloud Next conference on April 22 to show that its next artificial-intelligence hardware will be built around two jobs, not one: training models and serving them live to users. (cloud.google.com) A training chip is the system that teaches a model from huge datasets; an inference chip is the system that answers prompts after the model is deployed. Google said its eighth-generation line splits those tasks into TPU 8t for pre-training and TPU 8i for large-scale inference and reinforcement learning. (cloud.google.com) Google announced the chips at Cloud Next ’26 alongside a Gemini Enterprise Agent Platform, tying the hardware launch to software for building and running business agents. Sundar Pichai said the event’s roadmap centered on “the transition” to what Google calls the “agentic enterprise.” (blog.google) The split reflects a change in how companies use artificial intelligence in production. Google said pre-training, post-training and real-time serving now have different bottlenecks, so one accelerator design no longer fits the whole lifecycle. (cloud.google.com) That is where agents come in. Google said agents must reason, plan and execute multi-step workflows quickly enough to feel responsive, and it designed TPU 8i around low-latency serving for that kind of workload. (blog.google) TPU 8t is the training side of that bet. Google said it is optimized for massive-scale pre-training and embedding-heavy workloads, uses a 3D torus network, and scales to 9,600 chips in a single superpod. (cloud.google.com) Google also said both eighth-generation TPU systems use Arm-based Axion central processing units as hosts, aiming to reduce delays from data preparation and orchestration before work reaches the accelerators. In plain terms, the company is trying to keep expensive chips busy instead of waiting on surrounding systems. (cloud.google.com) The business pitch is that Google is no longer selling only a model or only a chip. Pichai said Google’s first-party models now process more than 16 billion tokens per minute through direct customer application programming interface use, up from 10 billion in the prior quarter, and that more than half of Google’s machine-learning compute investment in 2026 is expected to go to Cloud. (blog.google) Rivals are making similar moves, but Google’s announcement puts the design choice in unusually explicit terms. CNBC reported that Google is packing more static random access memory into the inference-focused chip as it pushes against Nvidia in the market for live artificial-intelligence workloads. (cnbc.com) The immediate takeaway from Las Vegas is that cloud companies are tuning their infrastructure for agents that have to act in real time, not just chat. Google’s hardware roadmap now treats that as a separate computing problem. (blog.google)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.