Gemma 4 at GTC
NVIDIA highlighted a new Gemma 4 family of models that scales up to 31 billion parameters and is tuned to run efficiently on RTX GPUs. (x.com). The announcement framed Gemma 4 as part of GTC’s shift toward models optimized for practical deployment rather than raw parameter counts. (x.com).
Gemma 4 is the latest sign that artificial intelligence model makers are chasing efficiency, not just size, with NVIDIA using its Graphics Technology Conference stage to spotlight models built to run locally on RTX hardware. (blogs.nvidia.com) Google DeepMind released Gemma 4 on April 2, 2026 in four sizes: E2B, E4B, 26B A4B, and 31B. NVIDIA said the family was optimized with Google to run across RTX personal computers and workstations, DGX Spark systems, and Jetson Orin Nano edge modules. (ai.google.dev) (blogs.nvidia.com) A parameter is a model’s internal setting, the number that helps it predict the next word or image token, and Gemma 4’s top dense model has about 30.7 billion of them. Google and NVIDIA are pitching that scale as small enough for consumer-grade graphics cards, but still large enough for coding, reasoning, and image-text tasks. (build.nvidia.com) (deepmind.google) The practical change is where the model runs. Google says the 26 billion and 31 billion versions are optimized for consumer graphics processing units, while the smaller E2B and E4B versions are meant for phones, Raspberry Pi boards, and Jetson devices that can run offline with near-zero latency. (deepmind.google) (ai.google.dev) That push reflects a broader split in artificial intelligence between giant cloud models and smaller open-weight models that developers can download, tune, and keep on their own machines. Google says Gemma 4 ships with Apache 2.0 licensing, support for more than 140 languages, and context windows up to 256,000 tokens, which is the amount of text a model can keep in working memory at once. (ai.google.dev) (build.nvidia.com) NVIDIA’s pitch at Graphics Technology Conference was less about one benchmark win than about local “agentic” software, meaning assistants that can call tools, inspect files, and act inside apps on a user’s own device. The company said Gemma 4 works with OpenClaw, a local agent framework for RTX systems and DGX Spark. (blogs.nvidia.com) Google’s own benchmark table makes the efficiency argument explicit. As of April 2, 2026, Gemma 4 31B IT posted 89.2% on AIME 2026 math, 80.0% on LiveCodeBench v6 coding, and 84.3% on GPQA Diamond science, while the smaller 26B A4B version trailed only slightly on those tests. (deepmind.google) The architecture also changed to make long prompts cheaper to handle. Google says Gemma 4 mixes local sliding-window attention with full global attention, a design meant to keep memory use lower while still letting the model track long documents and multimodal inputs. (ai.google.dev) (build.nvidia.com) The release does not put NVIDIA in control of the model itself. NVIDIA’s model card says Gemma 4 31B IT was built by Google DeepMind, while NVIDIA provides optimized deployment through its own runtime stack and hardware ecosystem. (build.nvidia.com) What NVIDIA highlighted at Graphics Technology Conference, then, was not a bigger model than rivals offer. It was a model family sized to fit more machines people already own, from mobile devices at the low end to RTX workstations at the high end. (blogs.nvidia.com) (deepmind.google)