NVIDIA Unveils 'Cosmos' Unified Robot Brain
NVIDIA has introduced its Cosmos Policy, a new framework designed to turn world foundation models into unified robot brains. The system enables robots to "see, predict, and act" without separate specialized models for each function. It also features integration with Vision-Language-Action (VLA) models for complex manipulation based on natural language commands, with a detailed implementation posted on Hugging Face.
- The 'Cosmos' framework is part of a larger NVIDIA initiative, Project GR00T (Generalist Robot 00 Technology), which aims to create a general-purpose foundation model for humanoid robots. - These foundation models are designed to run on a new specialized computer called Jetson Thor, a system-on-a-chip (SoC) based on the NVIDIA Blackwell architecture that delivers 800 teraflops of 8-bit floating-point AI performance. - The research is led by Jim Fan, a Senior AI Research Scientist at NVIDIA who heads the company's Generalist Embodied Agent Research (GEAR) team. - The core technical approach of Cosmos Policy reframes robot control as a video prediction problem; it imagines multiple future "movies" that include its own actions and then chooses the optimal one to execute. - Cosmos Policy is a specific implementation that fine-tunes a larger world model, 'Cosmos Predict-2', which is trained on vast amounts of video data to learn the physics of how objects move and interact. - The system treats robot actions, physical states, and value estimates as additional "frames" in a video sequence, allowing a single diffusion model to handle perception, planning, and control without separate modules. - This work is situated within NVIDIA's broader Isaac robotics platform, which includes Isaac Lab, an open-source simulation environment for training and testing robot policies using reinforcement and imitation learning. - The Cosmos family of models also includes 'Cosmos Reason', a 7-billion parameter vision-language model that provides common-sense understanding and multi-step reasoning to serve as the robot's planning "brain".