Gemma 4 Goes Local
- Google released Gemma 4 models in multiple sizes, including edge-oriented variants. - Reported sizes include a 26B mixture-of-experts and a 31B dense model aimed at local deployment. - Early reports say Gemma 4 makes running capable local LLMs on everyday hardware feel more practical (xda-developers.com).
Google released Gemma 4 on March 31 with new models built for phones, laptops, workstations, and cloud servers. (ai.google.dev) A language model is software that predicts the next word over and over; running it “locally” means those predictions happen on your own device instead of in a company’s data center. Google says Gemma 4 spans E2B and E4B edge models, a 26B A4B mixture-of-experts model, and a 31B dense model. (ai.google.dev) A dense model uses all of its parameters on every prompt, while a mixture-of-experts model routes each prompt through only part of the network, like sending work to a smaller team inside a larger company. Google’s overview says the 31B model is aimed at local execution and the 26B MoE is tuned for high-throughput reasoning. (ai.google.dev) The smaller E2B and E4B variants target mobile, edge, and browser use, including Pixel and Chrome-class devices. Google says those models are meant for “ultra-mobile” deployment rather than server hardware. (ai.google.dev) Google framed the release around “intelligence-per-parameter,” its shorthand for getting stronger results from fewer weights and less memory. In its launch post, the company said Gemma 4 31B ranked No. 3 and Gemma 4 26B ranked No. 6 on the Arena AI text leaderboard at release. (blog.google) That pitch lands in a market where developers have spent the past year trading off speed, memory use, privacy, and answer quality when they run open-weight models at home. Google’s own getting-started guide now points new users to the 26B A4B model because it offers broad capability with lower resource requirements. (ai.google.dev) Google also tied Gemma 4 to consumer hardware and app distribution instead of limiting it to research pages and cloud consoles. The company’s documentation says the family is suited to consumer graphics cards and workstations, while outside reviewers have focused on phone and laptop use. (ai.google.dev; xda-developers.com) On the cloud side, Google Cloud said Gemma-4-31B dense and Gemma-4-26B-A4B MoE are available for serving, pretraining, and post-training on Tensor Processing Units through Google Kubernetes Engine, Compute Engine, and Vertex AI. It also said the 26B MoE would arrive as a managed serverless option in Model Garden. (cloud.google.com) XDA’s April 18 hands-on said Gemma 4 was the first local setup that made running a capable model on everyday hardware feel practical enough to keep using. That is the test Gemma 4 now faces: not whether Google can ship another open model, but whether more people will actually leave the cloud turned off. (xda-developers.com)