NVIDIA open-sources Cosmos 3
- NVIDIA said on June 1 it launched Cosmos 3 as an open world foundation model for physical AI, releasing code, model checkpoints and tools. - NVIDIA said Cosmos 3 combines vision reasoning, world generation and action prediction in one system, with Nano and Super checkpoints posted publicly. - Developers can access Cosmos 3 code on GitHub and model checkpoints on Hugging Face, NVIDIA said this week.
NVIDIA said on June 1 that it had open-sourced Cosmos 3, a new world foundation model for physical AI that the company says is designed for robots, autonomous vehicles and other systems operating in the physical world. The release includes model checkpoints, code, post-training scripts, datasets and deployment tools, according to NVIDIA’s developer blog and GitHub repositories. NVIDIA described Cosmos 3 as an “open” omnimodel that can work across text, images, video, audio and actions. The announcement was published alongside product pages, research materials and public repositories on GitHub and Hugging Face. ### What exactly did NVIDIA release? NVIDIA’s June 1 announcement said Cosmos 3 includes Nano and Super model checkpoints, open datasets for robotics and autonomous driving, post-training scripts for adapting the model to domain-specific uses, and Cosmos NIM microservices for deployment on NVIDIA GPUs. The company’s GitHub page describes Cosmos as an open platform of world models, datasets and tools for robots, autonomous vehicles and smart infrastructure. (developer.nvidia.com) Hugging Face said in a blog post published with NVIDIA that the release ships with Cosmos 3 Super and Cosmos 3 Nano, Diffusers integration for generation pipelines, post-training scripts on GitHub and synthetic data generation datasets for physical AI. NVIDIA’s product site says developers can use Cosmos 3 as the backbone for world action models and policy learning in robotics and autonomous systems. (developer.nvidia.com) ### Why is NVIDIA calling Cosmos 3 an “omnimodel”? NVIDIA Newsroom said Cosmos 3 is built on a mixture-of-transformers architecture that combines vision reasoning, world generation and action prediction in a single system. NVIDIA said the model can “natively understand and generate” text, images, video, ambient sound and actions. (huggingface.co) NVIDIA Research said Cosmos 3 connects understanding, generation, simulation and action through a shared omnimodal world model. The research page says the system is aimed at physical-world tasks including robotics, smart spaces and driving benchmarks. ### Where do robotaxis and robotics fit into this release? NVIDIA’s developer blog said the open datasets and tools are intended for physical AI applications including robotics and autonomous driving. (nvidianews.nvidia.com) The same post included examples for warehouse safety data and autonomous driving video generation. (research.nvidia.com) NVIDIA’s product page says Cosmos 3 can be used to accelerate robot policy learning and to post-train generalized world models on specialized camera and embodiment data. The company’s broader Cosmos materials describe use cases across robotics, simulation, autonomous systems and physical scene understanding. ### How does this fit with NVIDIA’s other recent open-source releases? (developer.nvidia.com) NVIDIA said on May 31 that it had released a broader collection of open-source physical AI agent tools and skills spanning Omniverse, Cosmos, Alpamayo and Metropolis. That release covered robotics, autonomous vehicles, vision AI and industrial digital twins, according to the company’s newsroom statement. (nvidia.com) The Cosmos 3 launch also builds on earlier public Cosmos repositories. GitHub pages for NVIDIA’s Cosmos organization show cookbook materials and earlier model repositories that predate this week’s announcement, while the main Cosmos repository now includes Cosmos 3 documentation and releases. ### Where can developers find it now? GitHub shows NVIDIA’s main Cosmos repository as public, with documentation, quickstart instructions and release materials for Cosmos 3. (nvidianews.nvidia.com) Hugging Face hosts NVIDIA’s accompanying blog and model materials, and NVIDIA’s developer and research sites link to the broader Cosmos ecosystem. NVIDIA said developers can use the released checkpoints, datasets and scripts to adapt Cosmos 3 to their own robotics or autonomous-driving domains. (github.com) The next step is practical rather than scheduled: the code is already live on GitHub, and the model checkpoints are already posted on Hugging Face, according to NVIDIA’s June 1 materials. (developer.nvidia.com) (github.com)