Foundation Models Show Skill Transfer in Robotics
New research in embodied AI demonstrates that foundation models can effectively transfer manipulation skills across different robotic hardware. A model pre-trained on a complex 22-degree-of-freedom (DoF) robotic hand was successfully applied to a simpler 7-DoF gripper, resulting in a 30% performance gain. This showcases the potential for generalization, where large models can adapt learned behaviors to less complex, real-world robots.
- The underlying technology for many of these robotics models, like Google's RT-2 or OpenVLA, is the transformer architecture, which also powers large language models like GPT. These are adapted to be multimodal, processing inputs from vision, language, and sensors to output robot actions. - A primary challenge holding back wider adoption is the scarcity of high-quality, diverse robotics data; unlike LLMs trained on internet-scale text, robotic data is expensive and time-consuming to collect. - The 22-DoF hand mentioned is an example of a high-dexterity anthropomorphic hand, designed to mimic the complex motions of a human hand, making the successful skill transfer to a less complex 7-DoF arm a significant achievement in generalization. - This type of generalization is often achieved by pre-training on large, aggregated datasets, such as the Open-X-Embodiment dataset, which combines data from dozens of different robot morphologies and research institutions. - A key goal of this research is to achieve "zero-shot" or "few-shot" learning, where a model can perform novel tasks or operate new hardware with no or minimal fine-tuning, drastically reducing deployment time. - The field is attracting significant investment, with startups like Figure AI, Physical Intelligence, and Skild raising hundreds of millions of dollars to build general-purpose "brains" for robots. - Foundation models are being applied across the full robotics stack, from high-level perception and task planning to low-level motion control and dynamics prediction. - Researchers are actively exploring using generative AI to create synthetic data and augment real-world datasets, helping to overcome the data scarcity bottleneck and improve model robustness.