Gemma 4 lands on Cloud
Google has pushed Gemma 4 into its Cloud stack as an enterprise-ready model aimed at advanced reasoning and agent workflows. Making the model available on Cloud signals easier integration for production systems and agent tooling inside Google’s ecosystem. (x.com; x.com)
Google has moved Gemma 4 from a developer release into the part of its business that matters to paying customers: Google Cloud. On April 2, the company said Gemma 4 is now available across Cloud services including Vertex AI Model Garden, Cloud Run, training clusters, and sovereign cloud setups. That sounds like a packaging change. It is really a distribution change. Google is taking an open-weight model family that could already run on local hardware and putting it inside the machinery companies use to ship production systems (cloud.google.com, ai.google.dev). That matters because Gemma 4 is not pitched as a toy. Google introduced it as its “most capable” open model family, built from the same research line as Gemini 3 and aimed at reasoning-heavy, agent-style work. The lineup spans small edge models and larger cloud models, including Effective 2B, Effective 4B, a 26B mixture-of-experts model, and a 31B dense model. Google says the family supports text, image, and audio input, handles context windows up to 256,000 tokens, and works in more than 140 languages. Those are the ingredients you need if you want a model to do more than answer prompts. You need it to keep state, call tools, read documents, and survive real workflows (blog.google, cloud.google.com, docs.cloud.google.com). Google is also being unusually explicit about the kind of workflow it has in mind. Its Cloud material keeps returning to the same phrase: agentic workflows. In plain English, that means software that does multi-step work with some autonomy. Gemma 4 is presented as a model that can reason through a task, call functions, generate code, and return structured output. Google’s own Cloud Run guide shows it being served through vLLM as an OpenAI-compatible endpoint and then connected to Google’s Agent Development Kit. This is not a vague promise about future tooling. It is a recipe for turning an open model into a managed service that can sit behind an enterprise agent (cloud.google.com, docs.cloud.google.com). The interesting part is where Google wants that agent to live. Vertex AI gives customers the standard enterprise path: deploy the model to their own endpoints, choose the hardware, keep data inside their Google Cloud environment, and fine-tune with Vertex AI Training Clusters. Cloud Run offers a different path. Google says Gemma 4 can run there on NVIDIA RTX PRO 6000 Blackwell GPUs, with scale-to-zero behavior when traffic disappears. That is a direct answer to one of the biggest headaches in open-model deployment. Running your own model is appealing until the idle bill arrives. Google is trying to make the open-model route look less like infrastructure work and more like a normal cloud service (cloud.google.com, docs.cloud.google.com). This is also a strategic move inside Google’s own model stack. Gemini remains the flagship proprietary family on Vertex AI. Gemma gives Google a way to sell the opposite thing at the same time: a commercially permissive Apache 2.0 model that customers can inspect, tune, and keep under tighter operational control. Google is not choosing between open and closed. It is trying to own both lanes. The Cloud blog says the 26B MoE version is also slated to become fully managed and serverless in Model Garden, which pushes Gemma even closer to the convenience usually reserved for proprietary APIs (blog.google, cloud.google.com). That is why “Gemma 4 lands on Cloud” is more than a launch note. Google already had an open model. Now it has wrapped that model in deployment paths, compliance language, sovereign cloud positioning, and agent tooling. The final detail says the most about where this is going: Google’s Cloud Run documentation for Gemma 4 includes a section called “Build AI agents with Agent Development Kit using Gemma 4,” right after the instructions for spinning up the model behind a serverless endpoint (docs.cloud.google.com).