Gemma 4 hits 120 million downloads

- DeepMind's Gemma 4 has seen more than 120 million community downloads, signalling strong uptake for on-device and open models. - Community releases also include a DeepSeek‑V4‑Flash variant optimised for Mac that runs at roughly 20 tokens per second with good reasoning and code formatting. - Momentum for on‑device and community‑patched models keeps local experimentation and lightweight inference relevant for product work. (x.com) (x.com)

Gemma 4’s download number matters because it gives a hard adoption signal for open, local-first AI at a moment when most attention is still on giant hosted models. Omar Sanseviero of Google DeepMind said on May 24 that Gemma 4, released “a couple of weeks ago,” had already passed 120 million community downloads. Google’s own release pages show Gemma 4 launched on March 31, with a follow-up release on April 16, and describe it as an Apache 2.0-licensed family built to run from Android devices and laptop GPUs up to workstations. (digg.com) That scale is notable partly because Google had already been using Gemma’s broader ecosystem as proof of open-model traction. In its April 2 launch post, Google DeepMind said developers had downloaded Gemma models more than 400 million times since the first generation and had created more than 100,000 variants. Gemma 4 was framed there as the company’s “most intelligent open models to date,” with four sizes spanning small edge models and larger dense and mixture-of-experts releases. (blog.google) The practical point is not just popularity. Google said Gemma 4 was sized to run and fine-tune efficiently on hardware ranging from billions of Android devices to developer workstations, and highlighted low-latency, multimodal, on-device use for its smaller E2B and E4B models. That puts the download figure in context: developers are not only collecting checkpoints, they are testing a class of models designed for local deployment, mobile use and constrained hardware. (blog.google) The second part of the story is the community layer building around that trend. DeepSeek-V4-Flash is available on Hugging Face with instructions for Transformers, vLLM and SGLang, which means the base model is already positioned for broad experimentation across common inference stacks. Separately, a GitHub project for running DeepSeek-V4-Flash on Mac Studio shows how quickly the community is adapting frontier-ish open weights to Apple Silicon workflows. (huggingface.co) That Mac project is especially revealing because it is not a polished official release. The repository says it uses an MLX-native setup, an OpenAI-compatible API and a Gradio UI, and reports measured throughput of about 22 tokens per second with roughly 164GB of memory in use at 256K context. It also says official `mlx-lm` support for the `deepseek_v4` model type was not yet available as of April 2026, so the build relied on a fork plus manual tokenizer handling. In other words, the local-model ecosystem is still messy, but the community is doing the integration work anyway. (github.com) That combination — high download counts for Gemma 4 and rapid community patching around DeepSeek-V4-Flash — says something specific about where local AI sits in 2026. Hosted APIs still dominate production at scale, but open-weight and on-device models remain relevant for teams that care about latency, privacy, offline use, hardware control or cost discipline. Google’s own positioning for Gemma 4 leans directly into that argument by stressing intelligence-per-parameter and consumer-hardware efficiency rather than raw model size alone. (blog.google) For product builders, the takeaway is straightforward. Local inference is no longer just a hobbyist corner of the ecosystem: Google is shipping mainstream open models for edge hardware, Hugging Face is distributing large open releases like DeepSeek-V4-Flash, and community developers are filling runtime gaps on Macs and other local setups fast enough to keep experimentation moving. (blog.google)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.