NVIDIA open-sources sim-to-real tools
- NVIDIA open-sourced GR00T-VisualSim2Real, a humanoid robot training stack from NVLabs that packages the VIRAL and DoorMan sim-to-real pipelines on GitHub. (github.com) - The release targets zero-shot transfer from simulation to real robots using RGB cameras and proprioception, with DoorMan claiming up to 31.7% faster door opening than teleoperators. (doorman-humanoid.github.io) - It matters because NVIDIA is turning internal CVPR-era humanoid research into reusable code that plugs into its broader Isaac GR00T stack. (research.nvidia.com)
Humanoid robotics has a very specific bottleneck. The flashy parts — walking, balancing, even backflips — get attention, but useful work usually breaks on messy whole(github.com)hat gap is what NVIDIA is trying to narrow with GR00T-VisualSim2Real, a newly open-sourced codebase from NVLabs that bundles two of its humanoid learning systems, VIRAL and DoorMan, int(doorman-humanoid.github.io)ulation at scale, then move the policy onto a real robot without a giant real-world data collection campaign. (github.com) published GR00T-VisualSim2Real on GitHub under NVLabs as an Apache-2.0 project. The repository packages application code for VIRAL — “Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation” — and DoorMan, which focuses on door opening as a hard benchmark for humanoid control. The repo says the stack supports training in simulation, distilling a vision policy, evaluating checkpoints, and exporting models for deployment. It is built on Isaac Lab, Isaac Sim 5.1, TRL, and Hydra. (github.com) ### Why is “sim-to-real” the whole story? Because re(github.com)You can’t casually gather millions of safe trials of a biped pushing on doors, carrying objects, and recovering from mistakes in the real world. Simulation lets teams randomize lighting, textures, object properties, and physics at scale, then use that synthetic experience to train policies that hopefully survive contact with reality. That “hopefully” is the hard part — the reality gap is why a lot of robot demos look great in sim and brittle on hardware. VIRAL is NVIDIA’s answer to that gap. (arxiv.org)ework. It uses a teacher-student setup: first a privileged reinforcement-learning teacher trains with full simulator state, then a student policy learns to act from robot-friendly inputs like RGB vision and proprioception. The promise is zero-shot deployment on real humanoids, meaning the policy moves from sim to hardware without extra real-world fine-tuning for the basic transfer step. NVIDIA’s project page frames it as a way to learn long-horizon loco-manipulation entirely in simulation and deploy it on a Unitree G1-class robot. (arxiv.org) ### Why make door opening t(arxiv.org)as to see the handle from an egocentric camera, approach at the right angle, grasp correctly, rotate or unlatch the mechanism, pull or push while tracking the moving door, and stay upright while the whole interaction changes the forces on its body. That is locomotion and manipulation fused together — basically the exact thing humanoids struggle with outside curated demos. DoorMan treats that as the benchmark instead of avoiding it. (arxiv.org) ### What are the headline results? DoorMan’s project page says the policy was trained entirely in sim(arxiv.org)mance across diverse real-world doors. NVIDIA also says the system completed the task up to 31.7% faster than human teleoperators under the same whole-body control stack, with an average advantage of up to 7.15 seconds. The training recipe is also pretty concrete: PPO for the teacher, DAgger for distillation, then GRPO fine-tuning for the student, using L40S GPUs across the stages. (doorman-humanoid.github.io) ### How does this fit into GR00T? GR00T is NVIDIA’s larger(arxiv.org)pelines, simulation tools, and deployment workflows. Earlier this year NVIDIA described Isaac GR00T N1.6 as a multimodal vision-language-action model that plugs into a sim-to-real stack built with Isaac Lab, synthetic navigation tools, and CUDA-accelerated localization. GR00T-VisualSim2Real is the lower-level skill-learning side of that picture — less “general robot brain,” more “here is the training machinery for hard embodied behaviors.” (developer.nvidia.com))) ### What’s the real significance? Basically, NVIDIA is doing more than publishing papers. Both VIRAL and DoorMan are slated for CVPR 2026, but the code is already public, which means robotics teams can inspect the recipes, reuse the environment setup, and benchmark against the same tasks instead of reverse-engineering a slick video. That lowers the friction for researchers working on humanoid loco-manipulation — especially teams already inside the Isaac ecosystem. (research.nvidia.com) (developer.nvidia.com)nly policies from photoreal simulation onto real whole-body robots — now has a public NVIDIA reference stack behind it. In robotics, that kind of release matters because code travels farther than demos. (github.com)