Cluttered picking challenge
Solomon AI flagged cluttered picking as a core perception problem for physical‑AI systems and pointed to VR teleoperation plus 3D reconstruction as techniques teams are using to bridge gaps in automated pick reliability ( ). The posts discuss teleop for rapid data collection to improve 3D models where off‑the‑shelf vision pipelines fail in dense, messy bins ( ).
A robot can usually pick one clear object from a bin; it still struggles when parts overlap, hide each other, or jam together in a pile. (mit.edu) Solomon AI said in recent posts that “cluttered picking” remains a core perception problem for physical artificial intelligence systems, especially in dense bins where standard vision pipelines lose track of object shape and pose. Solomon sells AccuPick 3D, a bin-picking system built around 3D machine vision and point-cloud generation for random or unknown objects. (solomon-3d.com; solomon.co.th) In plain terms, bin picking means a camera and robot arm must figure out what is in a messy container, where each item sits in three dimensions, and which grasp will not hit neighboring parts. Massachusetts Institute of Technology’s manipulation notes describe the task as moving randomly arranged objects from one bin to another across a wide variety of shapes. (mit.edu) The perception failure starts with occlusion, the robotics term for objects blocking one another from view. A 2025 review of robotic bin-picking says industrial systems still have to cope with varying clutter, object texture, shape, and incomplete visual data in real production settings. (sciencedirect.com) Teams are leaning on teleoperation, which means a person remotely drives the robot and records successful motions as training data. Boston Dynamics said this month that it has “invested heavily” in teleoperation for Atlas because fluid, dexterous control is crucial for collecting high-quality data to train behavior models. (bostondynamics.com) Virtual-reality teleoperation adds a headset and hand controls so the operator can move inside a live or simulated robot scene as if using a first-person game interface. Nvidia said its Isaac GR00T workflow can stream hand-tracking data from a spatial-computing device into a robot simulation and stream the robot’s view back to the operator to record demonstrations. (developer.nvidia.com; developer.nvidia.com) Those demonstrations can then feed 3D reconstruction, which is the process of turning camera images and depth readings into a usable digital model of the scene. A March 2026 Frontiers paper on cluttered shelf picking describes a pipeline that uses a single image and depth map to reconstruct an approximate 3D environment before running grasp simulations. (frontiersin.org) That matters because a robot in a crowded bin is not just recognizing an object; it is predicting what the rest of the pile will do if one piece moves. The same Frontiers study says traditional vision methods can localize objects but often miss the physical consequences of extraction, including collisions and collapses. (frontiersin.org) Researchers are also trying to make teleoperation cheaper and faster. A University of Southern California-led system called Policy Assisted TeleOperation reported that assistive automation reduced operator mental load and improved data-collection efficiency by handling repetitive behaviors and asking for human input only when uncertain. (arxiv.org; clvrai.github.io) Solomon’s point is that the hard part is no longer showing a robot a clean demo on a clean table. The hard part is getting reliable picks from the kind of messy, partially hidden piles that warehouses and factories produce every day. (solomon-3d.com; mit.edu)