Google cuts AI response costs 30%

- Alphabet said on April 29 that engineering and infrastructure work cut the cost of Google’s “core AI responses” by more than 30%. - The bigger tell is where Google is spending next: Cloud revenue hit $20 billion, up 63%, while first-party model traffic reached 16 billion tokens per minute. - This matters because AI demand is exploding, and Google is betting cheaper inference plus custom chips will widen margins before I/O launches more products.

AI costs are starting to matter as much as AI quality. That is the real story in Google’s latest quarter. On April 29, Alphabet said it has cut the cost of its “core AI responses” by more than 30% through engineering work and better infrastructure, while demand across Search, Cloud, and Gemini keeps climbing. That sounds like a boring plumbing update. It isn’t. In AI, cheaper answers are the difference between a flashy demo and a business that can scale. (blog.google) ### What does “AI response cost” actually mean? Basically, every chatbot reply or generated summary burns compute. The expensive part is usually inference — running a trained model for real users, at speed, over and over. If Google can make the same answer materially cheaper, it can serve more people, keep prices competitive, and protect margins even as usage explodes. That is why a 30% cut matters more than a vague promise about “efficiency.” (fool.com) ### Why is Google talking about this now? Because usage is ramping fast. Sundar Pichai said Google’s first-party models are now processing more than 16 billion tokens per minute through direct API use, up from 10 billion last quarter. Gemini Enterprise paid monthly active users grew 40% quarter over quarter. Paid subscriptions across YouTube, Goog(fool.com)ot harder to monetize. (blog.google) ### Where are the savings coming from? From the full stack — not one magic model tweak. Google keeps framing this as a combined hardware-and-software job: custom TPUs, Axion CPUs, Nvidia GPUs, model optimization, and product engineering all working together. Pichai leaned hard on that “full-stack approach” in both the earnings remarks and the Cloud Next keynote rec(blog.google)so build the cheapest machine underneath them. (blog.google) ### Why do TPUs matter so much? Because custom chips are where Google can do something rivals cannot easily copy. Last year it introduced Ironwood, a TPU generation built for large-scale AI workloads, and by late 2025 TPU7x entered preview on Google Cloud. At Cloud Next 2026, Google moved on again and announced eighth-generation TPUs, splitting them into training-f(blog.google)han the prior generation for low-latency inference. That is almost a direct explanation for how response costs keep falling. (blog.google) ### Is Google really selling this infrastructure now? Yes — and that is a notable shift. Alphabet said TPU hardware agreements are now part of its cloud backlog and that it will deliver TPUs directly into select customers’ data centers, with most of that revenue landing in 2027. So the same infrastructure Google uses to make its own AI cheaper is also becoming a product. That turns internal efficiency work into an external business line. (fool.com) ### How big is the demand behind all this? Big enough that Cloud is now the clearest proof point. Google Cloud revenue hit $20 billion in Q1 2026, up 63%, and backlog nearly doubled sequentially to $462 billion. GenAI model revenue grew nearly 800% year over year, and Google said it closed multiple deals worth more than $1 billion. Those are huge numbers — but they also explain why Google has to keep squeezing costs. AI demand is arriving at hyperscale. (fool.com) ### So what should readers watch next? Watch I/O in May, but watch the margins underneath the demos even more. Google is teasing new product progress, yet the more durable advantage may be invisible — faster chips, cheaper inference, and tighter integration between models and infrastructure. The bottom line is that Google is trying to win AI with (fool.com) to look less like spending and more like leverage. (9to5google.com)

Google cuts AI response costs 30%

Get your own daily briefing