LLM-Powered Robots Being Trialed in Warehouses
Large language model-based agents are being tested in warehouse robotics, according to Stanford’s Dr. Lillian Zhao. Speaking on the Embodied Intelligence Podcast, she said the trials represent a leap in adaptive planning, with robots that can reason with high-level intent in unstructured environments.
The application of large language models to robotics goes beyond just language understanding; models like Google's RT-2 are true vision-language-action (VLA) systems. Trained on web-scale text and images, RT-2 translates this knowledge into direct robotic control, nearly doubling performance on novel, unseen tasks to 62% compared to its predecessor, RT-1. This approach allows the model to perform rudimentary reasoning about object categories and high-level descriptions without explicit training for every scenario. NVIDIA is tackling the challenge of creating generalist robots with Project GR00T, a foundation model designed to power humanoid robots. GR00T (Generalized Robot 00 Technology) enables robots to learn skills by observing human actions through imitation learning. The architecture uses a dual-system approach inspired by human cognition: a "fast-thinking" system for reflexes and a "slow-thinking" vision-language model for deliberate planning and reasoning. In the commercial space, humanoid robot startup Figure AI is deploying its Figure 02 robot for logistics and warehouse tasks. Powered by an AI system called Helix, the robot can sort a wide variety of packages, dynamically adjusting its grasp for different items like soft bags or flat envelopes. The company is building a high-volume manufacturing facility, BotQ, with the capacity to produce up to 12,000 humanoid robots annually. The broader warehouse automation market is shifting from isolated machines to integrated, AI-driven systems. The market is projected to grow to $24.09 billion by 2026, with over 4.2 million commercial robots expected to be installed worldwide by that year. This new phase of automation focuses on combining mobile robots with fixed systems, using AI for tasks like real-time inventory tracking and robotic de-palletizing. However, significant technical hurdles remain in applying LLMs to robotics. A key issue is the latency mismatch between slow LLM inference times (250-700 ms) and the fast control cycles needed for stable robot operation (1-10 ms). Furthermore, the tendency for LLMs to "hallucinate" or misinterpret context creates major safety and reliability concerns when the model's output results in physical action. Despite challenges, investment in the space is accelerating, with the robotics industry securing over $10.3 billion in funding in 2025. Startups are raising substantial early-stage rounds, such as Dyna Robotics, founded by veterans from DeepMind and Nvidia, which raised a $23.5 million seed round to build robotic arms powered by foundation models. This influx of capital signals strong investor confidence in the commercial potential of physical, embodied AI.