DeepMind’s Gemma 4 lands
What happened
Google DeepMind released Gemma 4, a family of Apache 2.0 models pitched for stronger reasoning and agentic workflows that can run on personal hardware. (x.com) The announcement sparked conversation about running advanced models locally and using tools like LM Studio or OpenClaw to build apps, changing where and how engineers prototype AI systems. (x.com)
Why it matters
Google DeepMind published Gemma 4 as a set of four openly available model variants and released them under the Apache 2.0 license — a permissive open-source license that allows reuse, redistribution, and commercial use without custom restrictions. ( opensource.googleblog.com ) ( deepmind.google/models/gemma/gemma-4/ ) The new family processes text, images, and (on the smaller variants) audio, and the larger variants are built to work with very long documents measured in the hundreds of thousands of text units so they can reason across long contexts; Google is shipping the models both through cloud endpoints and as downloadable weights for local use. ( ai.google.dev/gemma/docs/core/model_card_4 ) ( cloud.google.com/blog/products/ai-machine-learning/gemma-4-available-on-google-cloud ) Gemma 4 is released in four sizes named E2B, E4B, 26B A4B, and 31B; the family includes “dense” models (where all parts of the model run on every input) and a “mixture-of-experts” model (where only a subset of specialized parameters activate for a given input, which reduces average compute cost while keeping capacity). ( blog.google/innovation-and-ai/technology/developers-tools/gemma-4/ ) ( ai.google.dev/gemma/docs/core/model_card_4 ) Architecturally, Gemma 4 uses a hybrid attention design that interleaves a fast local sliding window (which keeps memory and latency low) with periodic global layers (which let the model maintain awareness across the whole input), and the release adds built-in support for structured function-calling, configurable “thinking” modes, and a native system role for more controllable multi-step agent behaviors. ( ai.google.dev/gemma/docs/core/model_card_4 ) ( developers.googleblog.com/bring-state-of-the-art-agentic-skills-to-the-edge-with-gemma-4/ ) Practical distribution and tooling arrived immediately: Gemma 4 weights and instruction-tuned variants are available on model hubs and supported in desktop tools and runtimes (LM Studio, Ollama, GGUF/MLX formats for local runtimes, and gateway tools like OpenClaw), and the smallest models list memory footprints low enough for phones and tiny edge boards while the largest 31B build targets consumer workstation GPUs. ( lmstudio.ai/models/gemma-4 ) ( apidog.com/blog/gemma-4-ollama-local/ ) ( github.com/bolyki01/localllm-gemma4-mlx ) Early benchmark reporting from DeepMind shows the 31B Gemma 4 variant placing among the top open models on Arena AI (a 1452 score reported on the DeepMind model page), and vendors note optimizations for consumer graphics processors and partnerships to speed local agentic workloads (including NVIDIA guidance and Android tooling for on-device development). ( deepmind.google/models/gemma/gemma-4/ ) ( blogs.nvidia.com/blog/rtx-ai-garage-open-models-google-gemma-4/?preview_id=92019 ) ( android-developers.googleblog.com/2026/04/android-studio-supports-gemma-4-local.html )
Key numbers
- Google DeepMind released Gemma 4, a family of Apache 2.0 models pitched for stronger reasoning and agentic workflows that can run on personal hardware.
- (x.com) Google DeepMind published Gemma 4 as a set of four openly available model variants and released them under the Apache 2.0 license — a permissive open-source license that allows reuse, redistribution, and commercial use without custom restrictions.
- ( deepmind.google/models/gemma/gemma-4/ ) ( blogs.nvidia.com/blog/rtx-ai-garage-open-models-google-gemma-4/?preview_id=92019 ) ( android-developers.googleblog.com/2026/04/android-studio-supports-gemma-4-local.html )
Sources
- x.com
- x.com
- opensource.googleblog.com
- deepmind.google/models/gemma/gemma-4/
- ai.google.dev/gemma/docs/core/model_card_4
- cloud.google.com/blog/products/ai-machine-learning/gemma-4-available-on-google-cloud
- blog.google/innovation-and-ai/technology/developers-tools/gemma-4/
- developers.googleblog.com/bring-state-of-the-art-agentic-skills-to-the-edge-with-gemma-4/
- lmstudio.ai/models/gemma-4
- apidog.com/blog/gemma-4-ollama-local/
- github.com/bolyki01/localllm-gemma4-mlx
- blogs.nvidia.com/blog/rtx-ai-garage-open-models-google-gemma-4/?preview_id=92019
- android-developers.googleblog.com/2026/04/android-studio-supports-gemma-4-local.html
Quick answers
What happened in DeepMind’s Gemma 4 lands?
Google DeepMind released Gemma 4, a family of Apache 2.0 models pitched for stronger reasoning and agentic workflows that can run on personal hardware. (x.com) The announcement sparked conversation about running advanced models locally and using tools like LM Studio or OpenClaw to build apps, changing where and how engineers prototype AI systems. (x.com)
Why does DeepMind’s Gemma 4 lands matter?
Google DeepMind published Gemma 4 as a set of four openly available model variants and released them under the Apache 2.0 license — a permissive open-source license that allows reuse, redistribution, and commercial use without custom restrictions. ( opensource.googleblog.com ) ( deepmind.google/models/gemma/gemma-4/ ) The new family processes text, images, and (on the smaller variants) audio, and the larger variants are built to work with very long documents measured in the hundreds of thousands of text units so they can reason across long contexts; Google is shipping the models both through cloud endpoints and as downloadable weights for local use. ( ai.google.dev/gemma/docs/core/model_card_4 ) ( cloud.google.com/blog/products/ai-machine-learning/gemma-4-available-on-google-cloud ) Gemma 4 is released in four sizes named E2B, E4B, 26B A4B, and 31B; the family includes “dense” models (where all parts of the model run on every input) and a “mixture-of-experts” model (where only a subset of specialized parameters activate for a given input, which reduces average compute cost while keeping capacity). ( blog.google/innovation-and-ai/technology/developers-tools/gemma-4/ ) ( ai.google.dev/gemma/docs/core/model_card_4 ) Architecturally, Gemma 4 uses a hybrid attention design that interleaves a fast local sliding window (which keeps memory and latency low) with periodic global layers (which let the model maintain awareness across the whole input), and the release adds built-in support for structured function-calling, configurable “thinking” modes, and a native system role for more controllable multi-step agent behaviors. ( ai.google.dev/gemma/docs/core/model_card_4 ) ( developers.googleblog.com/bring-state-of-the-art-agentic-skills-to-the-edge-with-gemma-4/ ) Practical distribution and tooling arrived immediately: Gemma 4 weights and instruction-tuned variants are available on model hubs and supported in desktop tools and runtimes (LM Studio, Ollama, GGUF/MLX formats for local runtimes, and gateway tools like OpenClaw), and the smallest models list memory footprints low enough for phones and tiny edge boards while the largest 31B build targets consumer workstation GPUs. ( lmstudio.ai/models/gemma-4 ) ( apidog.com/blog/gemma-4-ollama-local/ ) ( github.com/bolyki01/localllm-gemma4-mlx ) Early benchmark reporting from DeepMind shows the 31B Gemma 4 variant placing among the top open models on Arena AI (a 1452 score reported on the DeepMind model page), and vendors note optimizations for consumer graphics processors and partnerships to speed local agentic workloads (including NVIDIA guidance and Android tooling for on-device development). ( deepmind.google/models/gemma/gemma-4/ ) ( blogs.nvidia.com/blog/rtx-ai-garage-open-models-google-gemma-4/?preview_id=92019 ) ( android-developers.googleblog.com/2026/04/android-studio-supports-gemma-4-local.html )