On‑device AI push (Gemma 4)

Google released Gemma 4 as a version of its models designed to run on handsets with the claim that “no data ever leaves the device,” and the report links that capability to broader mobile agent work and a potential Siri overhaul built on Gemini tech. The write‑up frames this as an architectural direction that emphasizes local inference, private state management and graceful cloud fallbacks rather than pure cloud‑first model scale. (the-decoder.com)

On-device artificial intelligence means the phone does the computing itself, without sending prompts to a remote server. Google is now pushing that approach with Gemma 4, a new open model family released on April 2 for phones, laptops, and other local hardware. (blog.google 1) (blog.google 2) Google said Gemma 4 is available under an Apache 2.0 license in four sizes: Effective 2B, Effective 4B, 26B Mixture of Experts, and 31B Dense. The company said the smaller Effective 2B and Effective 4B versions are tuned for low-latency use on handsets and other edge devices. (blog.google) Google’s developer team said Gemma 4 can handle multi-step planning, offline code generation, image and audio understanding, and support for more than 140 languages. It also said developers can try the models through Google AI Edge tools and an AI Edge Gallery app on both iOS and Android. (developers.googleblog.com) On Android, Google tied the release directly to its system-level artificial intelligence stack. The Android Developers Blog said on April 2 that Gemma 4 in the AI Core Developer Preview is the foundation for the “next generation” of Gemini Nano, which Google said will ship on supported devices later in 2026. (developer.android.com) Google said the Android version comes in two handset-focused sizes, Effective 4B for harder tasks and Effective 2B for speed. The company said Effective 2B runs three times faster than Effective 4B, while the new model family is up to four times faster than earlier versions and uses up to 60 percent less battery. (developer.android.com) The pitch is privacy and reliability as much as speed. Google’s own explanation of on-device processing says local inference can keep data on the phone, work without an internet connection, and avoid the delay of sending each request to the cloud. (blog.google) That architecture still leaves room for bigger remote models when the phone cannot handle a task. Google’s Android and AI Edge materials describe Gemma 4 as part of a broader stack that lets developers build local features first and then expand across apps, desktops, and other devices with separate tooling. (developers.googleblog.com) (developer.android.com) The same local-versus-cloud split is showing up in the assistant market. Reuters reported on January 12, 2026, that Apple had struck a multi-year deal to use Google’s Gemini models for a revamped Siri, after earlier Bloomberg reporting in August 2025 said Apple was in talks with Google about that direction. (finance.yahoo.com) (money.usnews.com) Google is framing Gemma 4 as the part of that future that can live in your pocket. The company’s claim is that more assistant work can happen on the handset first, with less waiting, less battery drain than older models, and less need to move personal data off the device. (blog.google) (developer.android.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.