NVIDIA releases 30B edge multimodal model

- NVIDIA released Nemotron 3 Nano Omni on April 28, 2026, an open-weight multimodal model designed to run vision, audio, video and text workloads together. - The model uses a 30B-A3B hybrid mixture-of-experts design, activating 3 billion parameters per step, and NVIDIA said it delivers up to 9x throughput. - Model weights, technical reports and deployment options are available through Hugging Face, OpenRouter, NVIDIA NIM and partner platforms.

NVIDIA released Nemotron 3 Nano Omni on April 28, 2026, adding an open-weight multimodal model to its Nemotron family that is aimed at edge and single-GPU agent workloads. The company said the model combines video, audio, image and text understanding in one system rather than using separate stacks for speech, vision and language. NVIDIA said the model is built for agentic use cases including computer-use agents, document intelligence and audio-video understanding. The release was detailed in posts on NVIDIA’s corporate blog, developer site and research report. ### What exactly did NVIDIA ship? Nemotron 3 Nano Omni is a 30B-A3B hybrid mixture-of-experts model, according to NVIDIA’s product page and technical materials. That means the model has 30 billion total parameters, with 3 billion active for a given step, a design NVIDIA said is intended to keep latency and compute demands lower than a dense model of similar size. The April 28 launch positioned Nano Omni as the multimodal member of the Nemotron 3 lineup. (blogs.nvidia.com) NVIDIA’s developer page lists it alongside Nemotron 3 Nano, Super and Ultra, and describes Nano Omni as a single model for video, audio, image and text understanding in a simplified agent workflow. ### What makes it different from a standard multimodal stack? NVIDIA said many agent systems still chain together separate models for vision, speech and language, which adds inference hops and orchestration overhead. (developer.nvidia.com) Nano Omni is meant to replace that fragmented setup with one shared perception model that can take in multiple modalities and produce text output. The research report says Nano Omni is the first model in the Nemotron multimodal series to natively support audio inputs alongside text, images and video. (developer.nvidia.com) It is built on the Nemotron 3 Nano 30B-A3B backbone and adds a vision encoder, an audio encoder, dynamic image resolution, Conv3D-based temporal compression for video and a 256K-token context window. ### How does NVIDIA say it performs? (blogs.nvidia.com) NVIDIA said the model delivers “up to 9x” higher throughput than other open omni models with the same interactivity. The company also said the model led six leaderboards tied to document intelligence, video understanding and audio understanding, citing benchmarks including MMlongbench-Doc, OCRBenchV2, WorldSense, DailyOmni and VoiceBench. (research.nvidia.com) MediaPerf, which NVIDIA described as an open industry benchmark for video understanding models, showed Nano Omni with the highest throughput across every task and the lowest inference cost for video-level tagging, according to NVIDIA’s developer blog. Those performance claims come from NVIDIA’s own launch materials and technical blog. (blogs.nvidia.com) ### Where can developers get it and run it? NVIDIA said the model is available with open weights, datasets and recipes, and can be downloaded through Hugging Face. The company’s developer materials also list access through OpenRouter, NVIDIA NIM microservices, build.nvidia.com and more than 25 partner platforms. The developer page says Nemotron models can be deployed with frameworks including vLLM, SGLang, Ollama and llama.cpp on NVIDIA GPUs from edge devices to cloud and data center systems. (developer.nvidia.com) NVIDIA’s technical blog said Nano Omni supports optimized inference across Ampere, Hopper and Blackwell GPU families, with FP8 and NVFP4 quantization support. ### Which companies has NVIDIA named around the launch? NVIDIA said companies already adopting Nemotron 3 Nano Omni include Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, Palantir and Pyler. (blogs.nvidia.com) The company also said Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle and Zefr were evaluating the model. Gautier Cloix, chief executive of H Company, said in NVIDIA’s launch post that his company’s agents can use the model to interpret full-HD screen recordings faster than before. (developer.nvidia.com) That comment appeared in NVIDIA’s announcement alongside the list of adopters and evaluators. ### What should readers watch next? April 28, 2026 is the key release date tied to the launch, and NVIDIA has already published the model card, technical report and developer documentation for follow-on evaluation. (blogs.nvidia.com) The next concrete step for developers is testing the weights and deployment recipes now posted through Hugging Face, OpenRouter, NVIDIA NIM and the NVIDIA-NeMo Nemotron GitHub repository.

NVIDIA releases 30B edge multimodal model

Get your own daily briefing