NVIDIA teases Nemotron 3 Ultra
- NVIDIA said on June 1 that Nemotron 3 Ultra, a 550-billion-parameter open-weight model, will be released later this week after Jensen Huang’s Taipei keynote. - Artificial Analysis said Nemotron 3 Ultra scored 48 on its Intelligence Index, ahead of Nemotron 3 Super at 36 and gpt-oss-120b at 33. - NVIDIA and Artificial Analysis said fuller benchmarks and the model release are due later this week.
NVIDIA said on June 1 that it will release Nemotron 3 Ultra later this week, expanding its open-model lineup with a 550-billion-parameter system aimed at high-end reasoning and agentic AI workloads. Artificial Analysis, which said it partnered with NVIDIA on pre-release testing, described the model as the most intelligent U.S. open-weights model it has measured so far. The announcement came during CEO Jensen Huang’s keynote at GTC Taipei, held alongside Computex in Taiwan. NVIDIA has not yet published the full release package, but its developer materials already position Ultra as the top-end member of the Nemotron 3 family. ### What exactly did NVIDIA announce? Artificial Analysis said NVIDIA “just announced the release of Nemotron 3 Ultra” in Huang’s June 1 keynote and identified it as a 550B-parameter model with 55B active parameters. The benchmarking firm said the model will also be made available in NVFP4 quantization, as with Nemotron 3 Super, for higher inference performance. (artificialanalysis.ai) NVIDIA’s Nemotron developer page describes Ultra as the model in the family “designed for applications demanding the highest reasoning accuracy for complex agentic tasks.” The same page says Nemotron models are released with open weights, training data and recipes, and can be deployed through frameworks including vLLM, SGLang, Ollama and llama.cpp, as well as through NVIDIA NIM microservices. (artificialanalysis.ai) ### How does it fit into the Nemotron 3 lineup? NVIDIA introduced the Nemotron 3 family in December 2025 with Nano, Super and Ultra tiers, saying at the time that Super and Ultra would follow Nano in later releases. NVIDIA Research said Ultra was planned as the largest model in the family and described it as the variant intended to provide state-of-the-art accuracy and reasoning performance. (developer.nvidia.com) The Nemotron 3 white paper and model page say the family uses a hybrid Mamba-Transformer mixture-of-experts architecture, supports up to a 1 million-token context window, and includes features such as LatentMoE, multi-token prediction and inference-time reasoning budget control for the larger variants. NVIDIA says the family is built for specialized agents and multi-agent systems that need both throughput and transparency. (research.nvidia.com) ### What benchmark claims is NVIDIA leaning on? Artificial Analysis said Nemotron 3 Ultra scored 48 on its Intelligence Index, ahead of Gemma 4 31B at 39, Nemotron 3 Super at 36 and gpt-oss-120b at 33 among U.S. open-weights peers. The firm also said the model remained behind the “Chinese-led open weights frontier,” citing Kimi K2.6 at 54. (research.nvidia.com) Artificial Analysis also said Nemotron 3 Ultra served more than 300 tokens per second on a pre-release DeepInfra endpoint. The firm said that speed put it above market-served peer models from Chinese labs that it said generally run at 50 to 100 tokens per second, while offering higher intelligence than gpt-oss-120b at similar speed levels. (artificialanalysis.ai) ### Why does the “open-weight” label matter here? NVIDIA’s developer page says Nemotron models are “transparent,” with weights, training data and technical reports available for evaluation before production use. That matters for developers that want to inspect, fine-tune or self-host a model rather than rely only on closed APIs, especially for enterprise or regulated deployments. (artificialanalysis.ai) Jensen Huang said when NVIDIA launched the Nemotron 3 family in December that “open innovation is the foundation of AI progress” and said the company was turning advanced AI into “an open platform” for developers building agentic systems at scale. Early adopters NVIDIA named for the broader Nemotron family included Accenture, Oracle Cloud Infrastructure, Palantir, Perplexity, ServiceNow and Zoom. (developer.nvidia.com) ### What comes next, and where will developers get it? June 1 was the announcement date, but the model itself is slated for release later this week, according to Artificial Analysis. The firm said it would publish additional analysis and full benchmarks at release. NVIDIA’s Nemotron hub says model weights are distributed through Hugging Face and related deployment options are listed through NVIDIA NIM and partner endpoints. (nvidianews.nvidia.com) When Ultra goes live, the release is expected to land through those existing Nemotron channels rather than as a separate product line. That last point is an inference based on NVIDIA’s current distribution pattern for Nemotron models. (developer.nvidia.com) (artificialanalysis.ai)