Nvidia's Lyra 2.0

NVIDIA released Lyra 2.0, a framework designed to generate persistent 3D worlds by fighting temporal drifting with per-frame geometry and self-augmented training. (x.com)

Generative world models try to turn a flat image into a place you can move through, but they often lose track of where objects belong after a few seconds. Nvidia said on April 15 it built Lyra 2.0 to keep those 3D scenes stable over longer camera paths. (research.nvidia.com) Nvidia’s research page and arXiv paper describe Lyra 2.0 as a system for generating “persistent, explorable 3D worlds at scale” from a single image. The paper was posted on arXiv on April 15, 2026, and the project page went live the same week. (arxiv.org) (research.nvidia.com) The basic failure it targets is temporal drift, which is when a generated room, street, or object slowly changes shape as the camera keeps moving. Lyra 2.0 counters that by storing per-frame 3D geometry, then using that geometry to retrieve earlier views and match them to the next view it needs to render. (arxiv.org) (research.nvidia.com) The system also tries to stop error buildup during long runs by training on its own flawed outputs. Nvidia said it uses “self-augmented histories,” which means the model is shown degraded sequences from its own generations so it learns to correct drift instead of copying it forward. (arxiv.org) (research.nvidia.com) That work sits inside a broader push to build simulation systems for robotics, autonomous driving, and other “physical artificial intelligence” uses. Nvidia’s Cosmos Lab says it is working on world foundation models that simulate physical environments for training and evaluation. (research.nvidia.com 1) (research.nvidia.com 2) Lyra 2.0 also extends Nvidia’s earlier Lyra work instead of replacing it from scratch. The first Lyra project, accepted to the International Conference on Learning Representations 2026, focused on reconstructing 3D and 4D scenes from a single image or video through self-distillation with video diffusion models. (research.nvidia.com 1) (research.nvidia.com 2) In the new paper, Nvidia says the longer, more consistent video trajectories from Lyra 2.0 can be used to fine-tune feed-forward reconstruction models that recover higher-quality 3D scenes. That means the system is not only making prettier fly-throughs; it is also producing training data for models meant to infer 3D structure more directly. (arxiv.org) (research.nvidia.com) Nvidia has also published code for Project Lyra on GitHub and a model card on Hugging Face, framing the work as an open research release rather than a consumer product launch. The immediate test is whether these methods hold up outside curated demos, where long camera moves usually expose the seams in generated worlds. (github.com) (huggingface.co)

Nvidia's Lyra 2.0

Get your own daily briefing