NVIDIA releases Nemotron 3 Super
NVIDIA unveiled Nemotron 3 Super, a 120B-parameter open coding model that runs at 12B during inference and topped SWE-Bench Verified at 60.47% as the leading open-weight coding model. (X/Twitter) The model integrates Mamba-2, a LatentMoE with 512 experts, and was trained on 25 trillion tokens, and is available on Hugging Face. (x.com)
A coding model is software trained to read code, write fixes, and follow tests inside a repository. NVIDIA said on March 11 that its new Nemotron 3 Super is an open model built for those jobs at large scale. (blogs.nvidia.com) NVIDIA describes Nemotron 3 Super as a 120 billion-parameter model with 12 billion active parameters at inference, which means only part of the full network is used on each step. The company released model weights on Hugging Face on March 11 in BF16, FP8, and NVFP4 variants, plus a base checkpoint. (huggingface.co, research.nvidia.com) The model uses a mixture-of-experts design, a setup that routes each token to specialist submodels instead of running the whole model every time. NVIDIA said Super combines Mamba-2 layers, mixture-of-experts layers, and some attention layers, adds multi-token prediction for faster generation, and supports up to 1 million tokens of context. (developer.nvidia.com, huggingface.co) NVIDIA is pitching that architecture at a specific problem in agent software: long jobs keep resending logs, tool outputs, and prior reasoning, which drives up cost and can push an agent off task. The company said multi-agent systems can generate up to 15 times as many tokens as standard chats, and said Super’s 1 million-token window is meant to keep more of that history in memory. (developer.nvidia.com, blogs.nvidia.com) NVIDIA also tied the launch to coding benchmarks. Its technical report says Nemotron 3 Super scored 60.47% on SWE-bench Verified under the mini-SWE-agent setup, which NVIDIA presented as the top open-weight result at release. (research.nvidia.com, swebench.com) That benchmark now comes with a warning label. OpenAI said on February 23, 2026 that SWE-bench Verified is “increasingly contaminated,” found flawed tests in many audited failures, and said it no longer recommends the benchmark for frontier launch claims. (openai.com, swebench.com) NVIDIA’s pitch leans as much on speed as on score. The company said Nemotron 3 Super delivers up to 2.2 times the inference throughput of GPT-OSS-120B and up to 7.5 times that of Qwen3.5-122B on an 8,000-token input and 64,000-token output setting. (research.nvidia.com, research.nvidia.com) Training scale is part of that story. NVIDIA’s report says it pre-trained the model on 25 trillion tokens in two phases, then post-trained it with supervised fine-tuning and reinforcement learning, including more than 1.2 million environment rollouts across 21 environment configurations. (research.nvidia.com, developer.nvidia.com) NVIDIA said companies including Perplexity, CodeRabbit, Factory, Greptile, Palantir, Siemens, Cadence, and Dassault Systèmes are integrating or deploying the model in products and internal workflows. That puts Nemotron 3 Super in the middle of NVIDIA’s larger push to sell not just graphics processors, but the models, software stack, and deployment tools that run on them. (blogs.nvidia.com, research.nvidia.com) The release leaves developers with a concrete tradeoff to test: a large open coding model that tries to act like a much smaller one at runtime. NVIDIA is betting that lower active compute, longer memory, and open checkpoints will be enough to win workloads where coding agents spend hours inside the same repository. (huggingface.co, blogs.nvidia.com)