Standard Intelligence Unveils General Action AI Model
Standard Intelligence has unveiled FDM-1, which it describes as the first fully general computer action model. Trained on 11 million hours of video, the model can reportedly operate across diverse digital and physical domains, from web navigation to real-world driving, at 30 frames per second. The model's development suggests a future where a single, scalable foundation model could power both web agents and embodied robots.
- The model's ability to train on vast amounts of unlabeled internet video is a significant departure from previous methods that required expensive, human-annotated data. This is achieved by using a separate "inverse dynamics model" that automatically labels video frames with the actions (like key presses and mouse movements) that likely caused the on-screen changes. - FDM-1's architecture includes a highly efficient video encoder capable of compressing nearly two hours of 30 FPS video into just one million tokens. This is a critical innovation for processing the long-horizon tasks common in both complex software interaction and real-world robotic control, which was a major limitation of previous vision-language models. - The model's generalization from digital to physical tasks, such as driving a car in the real world after fine-tuning on less than an hour of data, is a key proof-of-concept. This suggests its potential as a foundation for a wide range of embodied agents, from autonomous vehicles to industrial robots, by translating high-level goals into low-level actions. - For defense applications, a general action model like FDM-1 aligns with the push towards "agentic warfare," where autonomous systems handle complex tasks in logistics, intelligence, and operations. Such a model could power unmanned convoys, optimize supply chains, or enable a single operator to manage a swarm of drones by specifying high-level objectives. - The company behind the model, Standard AI (also known as Standard Intelligence), was founded by former SEC engineers and has a leadership team with experience scaling technology at major organizations. For instance, CTO David Woollard has a background in high-performance computing and applied AI systems at institutions including NASA and Samsung. - This development comes after Standard AI pivoted from its initial focus on AI-powered autonomous checkout for retail to a broader vision of general-purpose AI agents. This strategic shift was led by CEO Angie Westbrock, who took the helm in early 2024. - While promising, scaling such foundation models for robotics faces challenges, including the scarcity of high-quality, real-world robotics data compared to internet-scale text and images. Ensuring the safety and reliability of these models in unpredictable physical environments is also a major area of ongoing research.