Gemma 4 quantized on Hugging Face

A quantized Gemma 4 31B (NVFP4) image was released on Hugging Face claiming 4x smaller weights, 256K context support, and 99.7% accuracy retention — and the model is vLLM-compatible on Blackwell GPUs. If the numbers hold up in benchmarks, that level of quantization materially shifts cost/latency trade-offs for 31B-class serving. (x.com)

Hugging Face shows an NVIDIA-published Gemma 4 instruction-tuned checkpoint with a model card timestamped April 2, 2026 and a full file listing on the hub. (huggingface.co) The model card explicitly records the quantization step as performed with NVIDIA’s TensorRT Model Optimizer and links to licensing terms under NVIDIA’s open model governance and Apache 2.0 references. (huggingface.co) NVIDIA’s developer blog frames these optimized checkpoints as part of a deployment story that spans Blackwell data‑center hardware down to Jetson edge devices and points users to Hugging Face artifacts and ready-to-run examples. (developer.nvidia.com) The open-source stack already has NVFP4 toolchain support: LLM Compressor documents workflows for FP4-style quantization with export paths intended for high-throughput runtimes. (docs.vllm.ai) Community projects provide practical conversion and serving scripts that report successful runs on Blackwell-class RTX PRO hardware, and multiple prebuilt Docker images include vLLM builds compiled with Blackwell/NVFP4 kernels for testing. (github.com) (hub.docker.com) Official vLLM release notes and community threads show NVFP4 enablement on Blackwell is present in recent builds, while early forum reports flag required combinations of vLLM, Transformers and CUDA versions for reliable Gemma 4 boots. (docs.nvidia.com) (forums.developer.nvidia.com) Independent writeups and benchmark guides published earlier this year characterize NVFP4 workflows on Blackwell as delivering strong accuracy recovery on very large models and measurable throughput gains versus legacy FP16/FP8 flows, giving a baseline for what to validate when running these new Gemma 4 artifacts. (developers.redhat.com) (zeroshot.it.com)

Gemma 4 quantized on Hugging Face

Get your own daily briefing