On‑device AI is running practical use cases

Local LLMs are moving from demos to real work: Google’s Gemma‑4 is being praised for speed on older phones and 1B‑parameter variants are already used for tasks like calculations and translation. (x.com) (x.com) (x.com) (x.com).

The old promise of “AI on your phone” usually meant a toy. It could label a photo, finish a sentence, maybe answer a canned question, but anything more ambitious still had to travel to a distant data center. This month, that line moved. Google released Gemma 4, a new family of open models built to run on local hardware, and developers immediately began showing it doing work that used to belong to the cloud: coding inside Android Studio, translating and calculating on-device, and serving as the reasoning engine inside phone-based agents and locked-down security tools (blog.google, android-developers.googleblog.com). Google’s pitch for Gemma 4 is not that a phone can suddenly host a giant frontier model. It is that smaller models have become efficient enough to be useful, and useful enough to be worth keeping close. The company released Gemma 4 in four sizes, from an “effective 2B” model aimed at edge devices up to larger workstation-class versions, and says the smaller ones were designed specifically for laptops and mobile devices rather than squeezed onto them afterward (blog.google, ai.google.dev). That “effective” label is part of the trick. Google’s E2B and E4B versions use a design that keeps some of the model’s memory in a form that is cheap to look up, instead of forcing the phone to recompute everything the hard way each time. The result is a model that behaves more like a larger one without demanding the same battery drain and memory budget. Google says Gemma 4 is also the base model for the next generation of Gemini Nano on Android, with performance up to four times faster than the previous version while using up to 60 percent less battery on supported devices (ai.google.dev, android-developers.googleblog.com). The clearest sign that this is no longer a lab demo is the software growing around it. Google’s own AI Edge Gallery app now supports Gemma 4 and lets people run models directly on phones, offline, with prompts and settings handled on the device itself. Google’s mobile deployment docs describe the same path through its MediaPipe inference tools, which are meant for ordinary app features like drafting text, searching information, and summarizing documents without sending the contents to a server (github.com, ai.google.dev). Developers quickly pushed that local setup into stranger territory. OpenClaw, an agent framework for running tool-using AI systems with local models, already documents local-first configurations for keeping code and data on a user’s own machine. On Android, a project called SeekerClaw embeds an AI agent inside an app that runs as a foreground service, exposing device controls, scheduling, search, and messaging from the phone itself. Its current public build still leans on remote model providers, but it shows the shape of the new idea: the handset is no longer just a client for AI somewhere else; it is becoming the place where the agent lives (docs.openclaw.ai, github.com). The privacy argument is obvious, but the latency argument may matter more in practice. A local model answers as fast as the device can answer. It does not wait for a network round trip, and it does not fail because a train tunnel, hospital basement, or factory floor has poor reception. For coding and other tool-heavy work, Google is now explicitly positioning Gemma 4 as a local reasoning model inside Android Studio’s Agent Mode, where it can refactor code, build features, and apply fixes while keeping the model and inference on the developer’s own machine (android-developers.googleblog.com). That same local-first logic is now reaching places where sending data away is not merely inconvenient but unacceptable. One new GitHub project, Local AI Pentest Suite, pitches Gemma 4 as the reasoning core for an air-gapped security-testing workflow that stays on a single GPU and never uploads an organization’s attack surface to the cloud. It is an early project, not a finished product, but it captures the shift in miniature: when models were too bulky and too slow, privacy was a slogan. Once they become small enough to run near the work, privacy turns into an engineering choice (github.com, blog.google). For years, on-device AI was sold as a future in which your phone might someday do more. The interesting thing about Gemma 4 is how ordinary the first real uses already look. Translate this. Solve that. Fix this block of code. Search my notes. Draft the message. Run the tool. And do it without asking permission from a server farm hundreds of miles away (ai.google.dev, github.com).

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.