Vision‑language models now planning robot trajectories, replacing hand‑coded motion
- Accenture, SAP, and Vodafone Procure & Connect said on April 22 they are piloting humanoid robots in a Duisburg warehouse, where inspection jobs are dispatched through SAP’s warehouse software. - In the pilot, the robot checked pallet stacking, weight distribution, misplaced or damaged goods, unused storage space, and aisle obstacles, then wrote findings back into SAP for real-time decisions. - The shift pairs language-and-vision planning with live motor control, cutting hand-coded recovery logic in messy sites and moving robots closer to general-purpose work. (openreview.net)
A robot that can read a scene and a written task is replacing more of the hand-coded motion logic that used to run warehouse machines. (newsroom.accenture.com) (openreview.net) The basic idea is a split brain. A vision-language model handles the “what should happen next” part from images and text, while a lower-level controller handles the millisecond-by-millisecond balance, stepping, and collision avoidance. (openreview.net) (blog.ai.princeton.edu) That matters because older robot deployments often depended on brittle scripts: if a pallet sat two inches off, or an aisle was partly blocked, engineers had to add another rule. The newer pattern lets the high-level model describe a spatial plan in language and leave the final corrections to the execution layer. (blog.ai.princeton.edu) (arxiv.org) Accenture, SAP, and Vodafone Procure & Connect put that pattern into a warehouse pilot in Duisburg, Germany, announced April 22 at Hannover Messe 2026. The humanoid robot received inspection tasks through SAP Extended Warehouse Management and moved through the facility autonomously. (newsroom.accenture.com) (www.businesswire.com) In that pilot, the robot looked for misplaced or damaged products, checked pallet stacking and weight distribution, flagged unused storage capacity, and detected hazards such as obstacles in aisles or misaligned pallets. It sent findings and recommendations back into SAP so warehouse managers could see them in real time. (newsroom.accenture.com) (therobotreport.com) SAP said in November 2025 that related embodied-AI proof-of-concept projects showed up to 50% reductions in unplanned downtime and up to 25% productivity improvement across manufacturing, warehouse automation, and quality inspection. In one separate pilot with NEURA Robotics at BITZER, SAP said a humanoid robot performed pick tasks in real time after virtual training in NVIDIA Isaac Sim. (news.sap.com) Researchers are also changing how the robot learns actions. Princeton’s VLM2VLA work, presented at International Conference on Learning Representations 2026, represents low-level actions in natural language first, then converts them into motor commands. (openreview.net) (blog.ai.princeton.edu) The paper reports more than 800 real-world robotics experiments and says the method preserved more of the model’s visual reasoning and multilingual instruction-following than action-token baselines. That is the technical change behind the business pitch: fewer custom rules, less manual exception handling, and faster adaptation when the floor no longer matches the script. (arxiv.org) (openreview.net) Accenture described the Duisburg test as a step from experimentation toward deployment at scale, not a full warehouse takeover. For now, the robot is doing inspection work that turns messy visual conditions into software tickets, which is exactly where hand-coded motion has tended to break first. (newsroom.accenture.com)