Google ships Gemma 4
Google released Gemma 4 as an Apache‑2.0 family of open‑weight models — edge variants (effective 2B and 4B), a 26B Mixture‑of‑Experts, and a 31B dense model — with day‑one weights on Hugging Face and Kaggle and reference runtimes via vLLM and llama.cpp. (infoq.com)
A large language model is software that predicts the next word, and Google has now shipped a new Gemma 4 family that developers can download and run themselves under an Apache 2.0 license. (blog.google) Google’s release page lists four Gemma 4 sizes released on March 31, 2026: E2B, E4B, 26B A4B, and 31B. Google says the small E2B and E4B models target mobile and edge devices, while 26B A4B and 31B target consumer graphics cards and workstations. (ai.google.dev) “Open weights” means Google published the trained model files, not just an application programming interface, so outside developers can download them from Hugging Face and Kaggle on day one. Google’s model overview page links directly to both repositories for Gemma 4. (ai.google.dev) The family mixes two model designs. Google’s 31B is a dense model, where all parameters are used on each step, while 26B A4B is a Mixture-of-Experts model, which routes each token through a smaller subset of specialist blocks to cut running cost. (ai.google.dev) Google says Gemma 4 is multimodal, meaning it can take in text and images across the family, with audio input supported on the smaller models. Kaggle’s Keras page says the models generate text output, support context windows up to 256,000 tokens, and handle more than 140 languages. (kaggle.com) Google is also pushing Gemma 4 as software people can run locally instead of only through a cloud service. The company’s announcement says developers have downloaded Gemma more than 400 million times since the first generation, with more than 100,000 community variants built on top. (blog.google) That local-first pitch depends on inference software, the layer that turns model files into a working chatbot or coding assistant. vLLM published a Gemma 4 guide with support for structured reasoning, tool use, vision handling, and deployment on Nvidia, Advanced Micro Devices, and Google Cloud Tensor Processing Units. (docs.vllm.ai) The other reference runtime in this launch is llama.cpp, a C and C++ project widely used to run models on laptops, desktops, and embedded hardware. Its GitHub repository now includes Gemma 4-specific tests, a sign that support landed quickly around release. (github.com) Google has been building out Gemma as a parallel track to its closed Gemini products. Google’s Gemma release log shows TranslateGemma, MedGemma 1.5, and FunctionGemma all arrived earlier in 2026 before Gemma 4 became the main general-purpose update at the end of March. (ai.google.dev) The immediate result is simple: developers who want Google-built models now have a new set they can download, fine-tune, and run across phones, single-graphics-card machines, and larger servers without waiting for third-party conversions. (huggingface.co)