Gemma 4 targets local‑first AI
Google released Gemma 4 with a clear focus on local‑first, on‑device inference and multimodal capabilities for Android developers. ( cloudwars.com ) Coverage notes the model is meant to support development pipelines that don’t rely solely on centralized cloud compute. ( infoq.com )
Google is pushing Gemma 4 as an artificial intelligence model family meant to run on your hardware, not just in Google’s cloud. (blog.google) Google said on April 2 that Gemma 4 comes in four sizes: Effective 2B, Effective 4B, 26B Mixture of Experts, and 31B Dense. The company said the release uses an Apache 2.0 license and builds on Gemma’s 400 million downloads and more than 100,000 variants. (blog.google) A model is the software that predicts the next word, image detail, or action from a prompt, and running it locally means those calculations happen on a phone, laptop, or workstation instead of a remote server. Google’s developer docs say the smallest Gemma 4 models are built for mobile, edge, and browser deployment, while the larger versions target desktops, workstations, and servers. (ai.google.dev) Google says all Gemma 4 models can handle text and images, while the Effective 2B and Effective 4B versions also take audio and video inputs. The same docs say the small models support a 128,000-token context window and the larger models support 256,000 tokens, which lets them keep more text in working memory during a session. (ai.google.dev) The Android pitch is more specific: Google says developers can use Gemma 4 inside Android Studio for local coding help and inside apps through the Machine Learning Kit Generative Artificial Intelligence Prompt Application Programming Interface for on-device features. Google also said Gemma 4 is the base model for Gemini Nano 4, the next on-device model it plans to ship on flagship Android devices later in 2026. (android-developers.googleblog.com) Google tied that phone strategy to speed and battery life. In its Android announcement, the company said Gemini Nano 4 is up to four times faster than the previous version and uses up to 60 percent less battery on Android devices. (android-developers.googleblog.com) InfoQ reported that Google is positioning the 26B Mixture of Experts model as a local coding agent for development machines, while the Effective 2B and Effective 4B models are aimed at direct on-device use. InfoQ also reported hardware targets of 8 gigabytes of random-access memory and 2 gigabytes of storage for Effective 2B, 12 gigabytes and 4 gigabytes for Effective 4B, and 24 gigabytes and 17 gigabytes for the 26B model. (infoq.com) Google’s own release notes list Gemma 4 as shipping on March 31, 2026, and the public blog post followed on April 2. That timing puts the launch inside a broader Google effort to offer both closed models, through Gemini, and open-weight models, through Gemma, for developers who want more control over where inference runs. (ai.google.dev, blog.google) Google is not abandoning cloud deployment with this release. Its cloud unit said Gemma 4 is also available on Google Cloud through Google Kubernetes Engine, Google Compute Engine, and Vertex Artificial Intelligence, with managed support for the 26B model coming to Model Garden. (cloud.google.com) The result is a split strategy: Gemma 4 for developers who want models on phones, laptops, and private workstations, and Gemini for developers who want Google-managed services. Google’s message in April 2026 is that the same family can now stretch from Android hardware to cloud infrastructure without forcing every prompt through a centralized server. (blog.google, android-developers.googleblog.com, cloud.google.com)