Gemma 4: local LLMs work
- Reviewers report Google’s Gemma 4 finally made running local LLMs practical on ordinary consumer hardware. (makeuseof.com) - XDA and MakeUseOf say Gemma 4 replaced older local stacks and made privacy‑friendly local inference realistic. (xda-developers.com) - Hobbyist coverage suggests local models are now usable for everyday tasks rather than experimental demos. ( )
A local large language model is an artificial intelligence system that runs on your own device instead of a company’s servers, and Google’s Gemma 4 is pushing that setup onto phones and ordinary PCs. (blog.google) Google announced Gemma 4 on April 2, 2026, as an open-weight family under an Apache 2.0 license, with four sizes: E2B, E4B, 26B A4B, and 31B. Google said the smaller models were built for mobile and edge devices, while the larger ones target consumer graphics cards and workstations. (blog.google, ai.google.dev) Running a model locally means downloading the trained “weights,” or the files that store what the system learned, and doing the computation on your own hardware. Google’s model card says Gemma 4 handles text and images across the family, supports more than 140 languages, and offers context windows of 128,000 to 256,000 tokens. (ai.google.dev) The hardware math is what changed the conversation. Google’s published inference table says the 4-bit version of Gemma 4 E2B needs about 3.2 gigabytes of memory, E4B needs 5 gigabytes, the 26B A4B needs 15.6 gigabytes, and the 31B model needs 17.4 gigabytes. (ai.google.dev) Google also built a phone path. Its AI Edge Gallery app on Google Play and Apple’s App Store says it now supports Gemma 4 and runs models “fully offline” on mobile devices. (play.google.com, apps.apple.com) Desktop tools already used by hobbyists and developers moved quickly to support the release. Google’s own integration pages point users to LM Studio and Ollama for local Gemma deployments on personal computers. (ai.google.dev, ai.google.dev) That is why the reaction this week came less from benchmark charts than from people swapping out older setups. MakeUseOf said Gemma 4 “replaced” its local stack after testing the E4B model on a PC with a 12GB Radeon RX 6700 XT and 64GB of RAM, while XDA said the new family made local models worth caring about for people without a home lab. (makeuseof.com, xda-developers.com) Google’s pitch is “intelligence-per-parameter,” meaning more useful output from smaller models. In its launch post, Google said the 31B model ranked No. 3 among open models on the Arena AI text leaderboard as of April 1, and the 26B model ranked No. 6. (blog.google) Local use still comes with tradeoffs. Google says higher-precision and larger-parameter versions cost more in memory, processing cycles, and power, and reviewers at MakeUseOf and XDA both said cloud systems from OpenAI, Google, and Anthropic still hold an edge for the hardest work. (ai.google.dev, makeuseof.com, xda-developers.com) The shift is that “local” no longer means a weekend experiment with a noisy desktop and a stripped-down model. With Gemma 4, Google is shipping phone-sized and laptop-sized models through mainstream apps and toolchains, and reviewers are describing them as useful for daily writing, coding, and private document work. (play.google.com, ai.google.dev, makeuseof.com, xda-developers.com)