Google's Gemini Agent Platform & TPUs

- Google launched a Gemini Agent Platform for enterprise agents alongside a split TPU strategy separating inference and training silicon. - The company announced TPU 8i for inference and TPU 8t for training, claiming significant price-performance gains for inference workloads. - The split targets 'agentic inference' economics, letting teams route live low-latency tasks to cheaper inference silicon (servethehome.com).

Google used Cloud Next on April 22 to pair a new Gemini Enterprise Agent Platform with two separate eighth-generation AI chips, one for training and one for live responses. (cloud.google.com) The platform folds Vertex AI’s model and agent tools into one system for building, running, governing, and securing enterprise agents, Google said. It also gives customers access to more than 200 models through Model Garden, including Google models and third-party options such as Anthropic’s Claude family. (cloud.google.com) Google said future Vertex AI services and roadmap updates will ship through this Agent Platform rather than as a separate standalone service. The company also added Agent Studio, an upgraded Agent Development Kit, a reworked runtime, and a Memory Bank for agents that keep state for days. (cloud.google.com) A TPU, or Tensor Processing Unit, is Google’s custom AI chip; training is the expensive phase where a model learns, while inference is the cheaper, constant phase where the model answers prompts. Google said those workloads now have different bottlenecks, so TPU 8t is built for pre-training and TPU 8i is built for large-scale inference and reinforcement learning. (cloud.google.com) Google framed that split around agents, which do not just answer once but often plan, call tools, check results, and answer again in loops. In its announcement, the company said low-latency TPU 8i is meant to keep those multi-step interactions fast enough for a “good user experience,” while TPU 8t is aimed at the heavy work of building the underlying models. (blog.google) For customers, the pitch is economics as much as speed: use the training chip when building or tuning a model, then move production traffic to the inference chip when agents are serving users. Google said both systems are part of its AI Hypercomputer stack, which combines chips, networking, software, and data-center design. (cloud.google.com) The training side is built for very large clusters. Google said TPU 8t scales to a 9,600-chip superpod and uses a 3D torus network topology, while the company’s keynote said the chip is intended to cut frontier-model development cycles from months to weeks. (cloud.google.com) (msn.com) Google tied the hardware launch to rising enterprise demand for AI systems that can act across business software instead of just chat. In its keynote, Google Cloud CEO Thomas Kurian said nearly 75% of Google Cloud customers now use the company’s AI products, and more than 16 billion tokens per minute are processed through Google’s first-party models via direct customer API use. (cloud.google.com) The immediate next step is availability rather than shipment at scale. Google said customers can request more information now and that TPU 8t and TPU 8i are slated for general availability later in 2026, with the agent platform launching as the control layer above them. (blog.google)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.