Gemma 4 can run on phones

Google’s small on‑device Gemini variants — Gemma 4 E2B and E4B — are compact enough to run on phones with about 12GB of RAM, with reported footprints of roughly 4.2GB and 5.9GB respectively. (androidauthority.com) That matters because smaller models let you run AI locally for faster responses and more privacy without always calling cloud APIs. (androidauthority.com)

A phone usually runs an artificial intelligence model by sending your question to a data center, waiting for an answer, and pulling the result back over the internet. Google is pushing the other way: put a small enough model on the device itself so the answer never has to leave your hand. (ai.google.dev) That only works if the model is tiny enough to fit into phone memory without choking the rest of the system. Google’s new Gemma 4 family includes two edge-sized versions called E2B and E4B that are aimed at phones and other local devices, while bigger versions are meant for laptops and servers. (ai.google.dev) Gemma is Google DeepMind’s open-weight model line, which means developers can download the model weights and run them themselves instead of calling a closed cloud service. In the Gemma 4 model card, Google says the small models are specifically designed for local execution on mobile devices. (ai.google.dev) The naming tells you the tradeoff. In Google’s Android developer preview, E4B is the stronger model for harder reasoning, while E2B is the faster one, and Google says E2B runs about 3 times faster than E4B on Android. (android-developers.googleblog.com) The trick is that these are not old-style text bots shrunk down until they become useless. Google says Gemma 4’s small models can handle text, images, and audio, support more than 140 languages, and keep up to 128,000 tokens of context, which is enough room for very long chats or large documents. (ai.google.dev) This phone push did not start with Gemma 4. In June 2025, Google introduced Gemma 3n as a mobile-first design with the same E2B and E4B sizes, and said those models used architectural tricks to behave like larger systems while running with memory footprints closer to traditional 2 billion and 4 billion parameter models. (developers.googleblog.com) One of those tricks is memory efficiency, which is like keeping only the pages you need open on a tiny desk instead of spreading every book across the floor. Google said Gemma 3n’s E2B and E4B could operate with as little as 2 gigabytes and 3 gigabytes of memory, which showed the company was designing this line around phones long before this week’s benchmarks. (developers.googleblog.com) The new benchmark leak is the part that makes this feel real instead of theoretical. Android Authority reports that the upcoming Gemini Nano 4 models based on Gemma 4 show footprints of about 4.2 gigabytes for E2B and 5.9 gigabytes for E4B, which puts both within reach of phones with around 12 gigabytes of random access memory. (androidauthority.com) Google is already laying the software path for that hardware. On April 2, 2026, the Android Developers Blog said Gemma 4 is the foundation for the next generation of Gemini Nano, and that Gemini Nano 4 devices will arrive later in 2026. (android-developers.googleblog.com) Developers do not have to wait for those phones to ship to try the idea. Google’s AI Edge Gallery app now supports Gemma 4 and lets people run models locally on mobile hardware, with features like chat, image understanding, and audio transcription that stay on the device. (github.com) That changes what “phone artificial intelligence” can mean. Instead of a stripped-down assistant that only works with a signal, Google is aiming at a model that can read a screenshot, transcribe speech, reason through a task, and answer in your language from inside the phone itself, using less battery than earlier versions and no server round-trip at all. (android-developers.googleblog.com)

Gemma 4 can run on phones

Get your own daily briefing