Google's Gemma 4 runs offline
Google announced that its Gemma 4 model can run fully offline on phones, enabling on‑device agentic tasks such as trend analysis and controlled API calls when connected. The change is positioned as a practical experiment platform for mobile-focused side projects via the Google AI Edge App (x.com).
Google just pushed a language model onto phones in a way that would have sounded backwards a year ago: the model can keep working with no internet connection at all, then call outside tools only when a developer explicitly lets it. Google is packaging that idea as Gemma 4 on mobile through its Google AI Edge stack and the Google AI Edge Gallery app. (developers.googleblog.com) To understand why that is unusual, start with where most artificial intelligence assistants live today. When you type into a typical chatbot, your words go to a data center full of graphics processors, the model runs there, and the answer comes back over the network. (developers.googleblog.com) That setup works because large models are heavy. They need memory, power, cooling, and sustained bandwidth that a phone usually cannot spare, which is why mobile assistants have mostly been thin remote controls for cloud systems rather than self-contained tools. (developers.googleblog.com) Running a model on the device changes the tradeoff. It cuts the round trip to a server, keeps prompts local by default, and lets features continue in places where a subway tunnel, airplane cabin, or weak rural connection would normally break the experience. (developers.googleblog.com) The catch is that mobile hardware forces brutal compromises. A model small enough to fit on a phone has to answer fast enough to feel responsive, use little enough memory to avoid crashes, and sip enough battery that it does not turn a handset into a hand warmer. (ai.google.dev) Google’s answer is to make some Gemma 4 variants specifically small and mobile-first. The company says the Gemma 4 family includes Effective 2B and Effective 4B models for ultra-mobile and edge deployment, alongside larger 26 billion and 31 billion parameter versions for stronger hardware. (blog.google, ai.google.dev) Gemma 4 is not just a text box that predicts the next word. Google says the family is built for reasoning, coding, function calling, and agentic workflows, with the smaller edge-oriented models also supporting text, image, audio, and video inputs in the mobile stack. (ai.google.dev, developers.googleblog.com) “Agentic” is one of those words that sounds bigger than it is. In practice, it means the model can take a goal like “summarize this week’s sleep notes and plot my mood trend,” break it into steps, use allowed tools, and return something more structured than a paragraph of text. (developers.googleblog.com) That is where offline execution becomes more interesting than ordinary chat. If the planning happens on the phone, a developer can let the model organize notes, generate code, transform data into flashcards or graphs, or prepare a tool call locally before anything touches the network. (developers.googleblog.com) Google’s April 2, 2026 announcement is that this is now available as a practical developer path rather than a research demo. The company said developers can access Android’s built-in Gemma 4 model through the new Artificial Intelligence Core Developer Preview, or use Google AI Edge to build in-app experiences across mobile, desktop, and other edge devices. (developers.googleblog.com) The showcase for that strategy is Google AI Edge Gallery, which Google describes as an app on iOS and Android for experimenting with on-device models. In the new Gemma 4 post, Google says the app now exposes “Agent Skills,” a way to test multi-step workflows that run entirely on-device. (developers.googleblog.com) Google’s examples are deliberately small and concrete. The company describes skills that can query Wikipedia, turn paragraphs or videos into summaries and flashcards, create graphs from user data such as sleep and mood logs, and combine Gemma 4 with other models for things like pairing photos with music. (developers.googleblog.com) The phrase “controlled application programming interface calls when connected” is important because it draws a line between local intelligence and outside actions. The model can do planning and formatting on the phone, while developers decide when it is allowed to reach a web service, an operating system tool, or another app. (developers.googleblog.com, developers.googleblog.com) Google had already been laying the groundwork for this. In February 2026, the company added on-device function calling demos to AI Edge Gallery, including “Mobile Actions,” where a compact FunctionGemma model translated requests like creating a calendar event or turning on the flashlight into local app intents without server pings. (developers.googleblog.com) Gemma 4 extends that idea from single commands to longer chains of work. Google says the models support multi-step planning, autonomous action, offline code generation, and more than 140 languages, which turns the phone from a remote terminal into a small general-purpose artificial intelligence workstation. (developers.googleblog.com) There is also a business angle in the licensing. Google says Gemma 4 is available under the Apache 2.0 license and provides open weights, which makes it easier for developers to download, tune, and ship experiments without depending entirely on a paid hosted model for every prototype. (developers.googleblog.com, ai.google.dev) The result is less a mass-market consumer launch than a sandbox for side projects. Google is effectively saying: here is a model small enough for phones, a gallery app to test it, and a path to let it act locally first and call outward second. (developers.googleblog.com) If that approach sticks, the most useful mobile artificial intelligence features may stop looking like chatbots with better manners. They may start looking like tiny assistants that can sort, summarize, draft, route, and trigger app actions on your phone even when the phone is offline, then use the internet only for the parts that truly need the internet. (developers.googleblog.com, developers.googleblog.com)