New AI Memory Gives Robots 15-Min Context

The Physical Intelligence Team has unveiled MEM, a multi-scale memory system for robots that gives foundation models a 15-minute context window. This breakthrough allows robots to remember and reason across complex, long-horizon tasks, closing a key gap between machine and human learning in physical environments.

MEM tackles a core bottleneck in robotics: the "goldfish effect," where robots, particularly Vision-Language-Action (VLA) models, operate with only a few seconds of history. This limitation has historically confined even advanced systems to short, simple tasks, as they lack the context to handle multi-stage operations like cleaning a kitchen or preparing a recipe. The system's innovation lies in its dual-scale memory architecture, which mimics human-like memory by separating short-term visual data from long-term semantic understanding. For immediate actions requiring fine-grained spatial awareness, like adjusting a grip, MEM uses an efficient video encoder to process recent visual frames. This avoids the high computational cost of feeding minutes of video into the model's context window. For long-horizon context, MEM summarizes events into a natural language "narrative." Instead of storing every visual frame of a refrigerator door opening, it creates a text-based note like "I opened the fridge door." This chain-of-thought process allows the robot to track its progress over tasks lasting up to 15 minutes, a significant leap for VLA models. Developed by a team from Physical Intelligence, Stanford, UC Berkeley, and MIT, MEM is integrated into the π0.6 VLA, which is built upon a pre-trained Gemma 3-4B model. This foundation model was pre-trained on a diverse mix of robot demonstrations, vision-language tasks, and internet video data, providing a rich base for the memory system. This architecture directly addresses "causal confusion," a common failure mode where a robot erroneously repeats past actions simply because they are in its recent history. By distinguishing between immediate visual cues and a longer-term task summary, the system can adapt its strategy based on recent failures. This resulted in a 62% success rate increase in opening refrigerators with unknown hinge directions during evaluations.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.