Gemma 4 nudges local LLMs forward

Google's Gemma 4 family includes E2B and E4B variants targeted at phones and edge devices plus larger 26B mixture‑of‑experts and 31B dense models, positioning a credible local tier for edge LLM use cases. (xda-developers.com) Google has also been rolling Gemma variants into Vertex AI, indicating a push to make the models available both at the edge and via hosted infrastructure. (x.com)

A large language model is a text-prediction system, and “local” means it runs on your phone, laptop, or company hardware instead of a remote server. Google’s Gemma 4 release on March 31 adds new small and midsize options built for that setup. (ai.google.dev) Google says Gemma 4 comes in four sizes: E2B, E4B, 31B, and 26B A4B, with the two “effective” models aimed at phones and other edge devices. The company’s model card says the family supports up to 256,000 tokens of context and more than 140 languages. (ai.google.dev) The larger two models use different designs: 31B is a dense model, while 26B A4B is a mixture-of-experts model, a setup that routes each task through part of the network instead of all of it. Google says developers can try the 31B and 26B models in Google AI Studio and the E2B and E4B models in Google AI Edge Gallery. (blog.google) Google also tied the launch directly to Android and edge tooling. Its Google AI Edge team said Gemma 4 is available under the Apache 2.0 license, can be accessed through the new AI Core developer preview on Android, and can be used with LiteRT-LM for mobile, desktop, and internet-of-things deployments. (developers.googleblog.com) The practical pitch is lower delay, lower recurring cost, and tighter data control. Google’s launch post says the models were sized to run and fine-tune across hardware ranging from Android devices and laptop graphics processors to workstations and accelerators. (blog.google) Google is not limiting Gemma 4 to local use. Google Cloud says Gemma 4 is also available on Vertex AI and Model Garden, where companies can deploy it on their own endpoints, fine-tune it, and keep serving inside their Google Cloud environment. (cloud.google.com) That puts Gemma in two lanes at once: on-device for apps that need offline or private inference, and hosted for teams that want managed infrastructure. Vertex AI’s open-model documentation lists Gemma 4 for multimodal input, including text, image, and audio, with text output. (docs.cloud.google.com) Google is also framing Gemma 4 as closer to frontier performance than earlier small open models. In its announcement, the company said the 31B model ranked No. 3 and the 26B model No. 6 on the Arena AI text leaderboard at launch, while using far less hardware than much larger systems. (blog.google) Independent reactions have focused less on benchmark charts than on whether the models are finally practical on ordinary hardware. XDA Developers wrote this week that Gemma 4 was the first recent local model family that made running local large language models feel worth the effort again. (xda-developers.com) The result is a clearer split in how Google wants developers to use its open models. Gemma 4 is being offered as software you can run in your pocket, on your laptop, or inside Vertex AI, with the same family spanning all three. (blog.google)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.