Gemma 4 for on‑device agents
DeepMind unveiled Gemma 4, an Apache‑2.0 open model family pitched for on‑device agents so developers can run agentic features locally. That move prioritises lower latency, privacy and cost for edge use cases but also shifts product trade‑offs toward model size, reliability and device constraints. (sci-tech-today.com)
Google DeepMind has released Gemma 4, a new family of open models aimed at a very specific problem: how to make AI agents useful without sending every request back to the cloud. The company says the models are built for “advanced reasoning and agentic workflows,” and this time it is shipping them under an Apache 2.0 license, not the more restrictive terms that often shadow “open” AI releases. That matters as much as the benchmarks do. Apache 2.0 is a standard software license that companies already know how to use, modify, and ship in products without legal guesswork (blog.google, opensource.googleblog.com). The release is really two stories hiding inside one model family. At the top end, Google is offering 26B and 31B versions meant for consumer GPUs and workstations. At the small end, it has E2B and E4B models designed for phones, laptops, and edge devices. Google’s pitch is blunt: the small models trade raw size for latency, memory efficiency, and offline use, while the larger ones try to squeeze frontier-level performance into hardware that developers can actually own. The company says the 31B model ranks as the No. 3 open model on Arena AI’s text leaderboard, and the 26B version sits at No. 6, with both outperforming much larger systems on some tests (blog.google, deepmind.google). That split explains why Google keeps talking about “on-device agents” instead of just “open models.” Gemma 4 includes native function calling, long context windows, and multimodal input. The model card says the family can handle text and images across the board, with audio support on the smaller models, and context windows ranging from 128K to 256K tokens. Google also says the models support more than 140 languages. In other words, these are not tiny chatbot weights for demos. They are meant to plan, call tools, parse media, and keep enough context to finish multi-step tasks without falling apart immediately (ai.google.dev, deepmind.google). The interesting part is where Google wants those tasks to run. In its edge announcement, the company tied Gemma 4 directly to Android’s AICore Developer Preview and to LiteRT-LM, its inference stack for local deployment. It also showed an “Agent Skills” feature inside Google AI Edge Gallery that runs multi-step workflows on-device, including querying outside knowledge sources, generating summaries and flashcards, building graphs from spoken input, and chaining Gemma to other models for speech or music generation. That is a much more ambitious claim than “AI on your phone.” It is Google trying to make local agents feel like apps, not toys (developers.googleblog.com, android-developers.googleblog.com, ai.google.dev). But the trade-off is the whole point of the story. Running locally can cut latency, protect private data, and reduce inference costs because the model sits on the user’s hardware instead of a rented server. It also forces developers to live inside the limits of that hardware. Memory is tight. Battery is finite. Thermal throttling is real. Reliability gets harder when an agent has to reason, use tools, and stay coherent inside a smaller model budget. Google’s own materials hint at that tension by separating the mobile-first E2B and E4B models from the workstation-class 26B and 31B versions, then emphasizing efficiency almost as much as intelligence (blog.google, ai.google.dev, deepmind.google). That makes Gemma 4 less a pure model launch than a bet about where AI products are heading next. For the past two years, the default assumption was that the smartest systems had to live in giant data centers. Google is now arguing that a meaningful slice of agentic AI should move outward, onto phones, laptops, and embedded devices, even if that means accepting smaller models and sharper constraints. The strongest evidence for that shift is not a benchmark chart. It is the fact that Google shipped the smallest Gemma 4 variants into Android’s AICore preview and built demo agents around offline, on-device workflows from the start (android-developers.googleblog.com, developers.googleblog.com).