ActiveGlasses demo capture

ActiveGlasses demonstrated a workflow for capturing natural human demonstration data via smart glasses to teach manipulation behaviors to robots. The team described the approach as preserving real human trajectories and interactions for scalable data collection used in imitation learning and policy training. (x.com)

A robot usually learns a hand task from robot-collected data; ActiveGlasses proposes collecting that data from a person’s point of view instead. (arxiv.org) The system was posted to arXiv on April 9, 2026 by researchers from Shanghai Jiao Tong University, Shanghai Innovation Institute, and Noematrix Ltd. The paper says a stereo camera on smart glasses is the only perception device used in both data collection and robot inference. (arxiv.org) In plain terms, the glasses record what the person sees while the person uses bare hands to do a task. The same camera setup is then mounted on a six-degree-of-freedom perception arm on the robot so the robot can recreate the human viewing behavior while it acts. (arxiv.org) That “active vision” piece is the point of the project. The authors say older collection pipelines often depend on handheld tools that add operator burden and miss the way people naturally move their head and hands together during everyday manipulation. (arxiv.org) The paper says ActiveGlasses extracts object trajectories from each demonstration and feeds them into an object-centered point-cloud policy, which is a model that works from three-dimensional object shape and position rather than flat video alone. That policy predicts both the robot’s manipulation and its camera movement. (arxiv.org) The authors report zero-shot transfer, meaning the robot runs the learned behavior without robot-specific demonstration data for that task. They also say the method outperformed baseline systems under the same hardware setup and generalized across two robot platforms on tasks with occlusion and precise contact. (arxiv.org) The data problem behind this is not new. The ActiveGlasses paper cites an estimate that collecting robotic manipulation data at the scale of modern foundation-model datasets would take about 100,000 years with current physical collection methods. (arxiv.org) Other labs have been pushing the same basic direction with different hardware. New York University and University of California, Berkeley researchers described EgoZero in June 2025 as a smart-glasses system using Meta’s Project Aria device to collect first-person demonstrations, and IEEE Spectrum reported in August 2025 that the team trained a robot on seven manipulation tasks with about 20 minutes of human data per task. (techxplore.com) (spectrum.ieee.org) What changes in ActiveGlasses is the emphasis on preserving human head movement as part of the demonstration, not just hand action. If that holds up beyond the paper’s tests, the pitch is a data-collection workflow that looks more like a person doing chores and less like a person operating a robot rig. (arxiv.org)

ActiveGlasses demo capture

Get your own daily briefing