Nemotron 3 Super claims
- Nvidia announced Nemotron 3 Super, a 120B open model reportedly trained end‑to‑end in NVFP4 four‑bit format for Blackwell GPUs. (news.bitcoin.com) - Nvidia claims inference on B200 hardware runs up to four times faster than FP8 on H100 with no reported accuracy loss. (news.bitcoin.com) - These are vendor performance claims positioning Blackwell‑era formats as a major efficiency play for agentic workloads. ( )
Nvidia says its new Nemotron 3 Super model was trained end to end in a 4-bit format built for Blackwell chips, cutting memory use and raising speed claims in one release. (developer.nvidia.com) A model is a stack of numeric weights, and smaller numbers take less space to store and move. Nvidia says Nemotron 3 Super uses NVFP4, its 4-bit floating-point format for Blackwell graphics processors, instead of larger formats that usually dominate training. (developer.nvidia.com) Nvidia released Nemotron 3 Super on March 11, 2026 as a 120 billion parameter model with 12 billion active parameters at inference, using a mixture-of-experts design that turns on only part of the model for each request. Nvidia’s research page says it is the first Nemotron 3 model to use latent mixture-of-experts, multi-token prediction, and NVFP4 pretraining. (research.nvidia.com) The company says inference on Blackwell B200 hardware runs up to four times faster than FP8 inference on Hopper H100, while maintaining accuracy. Nvidia also says the model supports a 1 million-token context window and targets “agentic” workloads such as long, multi-step software and enterprise tasks. (developer.nvidia.com, developer.nvidia.com) The hardware comparison is central to the pitch. Blackwell is Nvidia’s newer accelerator generation and Hopper is the prior one, so the company is pairing a new low-precision format with a new chip family rather than isolating one variable. (developer.nvidia.com, developer.nvidia.com) Nvidia’s broader argument is that lower-precision math can make large models cheaper to train and serve if accuracy holds up. In June 2025, the company introduced NVFP4 as a Blackwell-era format designed to keep accuracy close to higher-precision systems while shrinking compute and memory costs. (developer.nvidia.com) Nemotron 3 Super is also part of Nvidia’s push to publish more of the stack. The company’s model page says it is releasing open weights, datasets, recipes, and technical reports for the Nemotron 3 family under the Nvidia Open Model License. (developer.nvidia.com, github.com) The model card on Hugging Face says Nemotron 3 Super was trained in English plus 19 other languages and 43 programming languages. Nvidia’s launch post also says reinforcement-learning post-training used more than 1.2 million environment rollouts across 21 environment configurations. (huggingface.co, developer.nvidia.com) Independent verification of the headline speed and accuracy claims is still limited in public sources. For now, the four-times-faster figure is a vendor benchmark, and the near-term test is whether outside developers can reproduce those gains on Blackwell systems with the released model and recipes. (developer.nvidia.com, github.com)