Model‑Based RL 'Dream to Fly' Paper
Researchers at the University of Zurich presented 'Dream to Fly,' a model‑based reinforcement learning system that trains quadrotors from raw camera pixels to fly agile racing tracks without explicit state estimation. The approach reportedly discovers perception behaviors end‑to‑end and shows higher sample efficiency than model‑free baselines like PPO. (x.com)
Reinforcement learning is the trial-and-error method behind game-playing artificial intelligence, and a model-based version teaches a machine to imagine likely futures before it acts. University of Zurich researchers used that approach to train a racing drone from raw camera images instead of a hand-built map of its position. (arxiv.org) The paper, “Dream to Fly,” is by Angel Romero, Ashwin Shenai, Ismail Geles, Elie Aljalbout, and Davide Scaramuzza of the university’s Robotics and Perception Group. The group lists it as an IEEE International Conference on Robotics and Automation paper for Vienna in 2026. (rpg.ifi.uzh.ch) Most autonomous drones estimate their state first — where they are, how fast they are moving, and which way they are pointed — before deciding what to do next. This system skips that explicit state-estimation step and learns a direct path from onboard pixels to motor commands. (arxiv.org) The authors say recent vision-based racing systems often depended on intermediate representations or on imitation learning, where a machine copies demonstrations from an expert pilot. Their method instead trains from scratch with DreamerV3, a model-based reinforcement learning algorithm, using only pixel observations. (arxiv.org) In the paper’s setup, the drone learns a world model — an internal predictor of what the next camera frame and outcome might look like after an action. The authors report that this approach was more sample-efficient than model-free baselines such as Proximal Policy Optimization, or PPO, and Soft Actor-Critic, or SAC, in the same vision-based task. (arxiv.org) The researchers also report that a “perception-aware” behavior emerged on its own during training. Instead of needing a hand-written reward for where to look, the policy learned to steer the camera toward texture-rich parts of racing gates that carry more visual information. (arxiv.org) That detail matters in drone racing because a small quadrotor has only fractions of a second to line up the next gate at speed. The paper says the learned policy was deployed in simulation and in real-world flight through a hardware-in-the-loop setup with rendered image observations at speeds of up to 9 meters per second. (arxiv.org) The work lands in a field where Scaramuzza’s lab has spent years pushing autonomous racing, including systems that rely on classical control, perception pipelines, and faster planning. The group’s drone-racing overview says the challenge combines onboard perception, localization and mapping, trajectory generation, and optimal control at high speed. (rpg.ifi.uzh.ch) The claim in “Dream to Fly” is narrower than “drones can race from vision alone” and more specific: a learned world model can make pixel-based training practical enough to transfer toward real hardware. If that holds up beyond the paper’s tracks and setup, it gives roboticists another route to flying fast without building every perception module by hand. (arxiv.org)