OpenAI Unveils Desktop-Controlling AI

OpenAI just launched GPT-5.4, a major leap for agentic AI that can autonomously control a desktop environment. The model can execute multi-step workflows by simulating mouse clicks and keyboard inputs, handling tasks like coding, file management, and interacting with SaaS tools. This shifts the paradigm from AI as a 'copilot' to an autonomous worker that could automate repetitive engineering and operational tasks.

On the OSWorld-Verified benchmark, which tests an AI's ability to navigate a desktop environment, GPT-5.4 achieved a 75% success rate. This score surpasses the average human performance of 72.4% and is a significant leap from the 47.3% achieved by its predecessor, GPT-5.2. The model operates by interpreting screenshots of a user's screen and then programmatically executing mouse clicks, keyboard inputs, and code to control software. This allows it to interact with any application, including legacy systems that lack APIs, by using the graphical user interface directly. For developers, GPT-5.4 supports a 1 million token context window, enabling it to analyze entire code repositories or large sets of documentation in a single request. This massive context is designed for complex tasks like understanding large codebases, fixing bugs across multiple files, and designing UI systems. While its score on the SWE-Bench Pro coding benchmark saw only a modest increase, GPT-5.4 integrates the capabilities of GPT-5.3-Codex and introduces a "/fast" mode in the Codex environment that boosts token generation speed by up to 1.5x. A new API feature called "Tool Search" dynamically loads only the necessary tool definitions for a given task instead of requiring them all in the prompt. This approach has been shown to reduce token consumption by as much as 47%, directly lowering costs and latency for developers building complex AI agents. In ChatGPT, a new "Thinking" mode allows the model to display its work plan upfront. Users can then intervene and make adjustments to the AI's reasoning process mid-task, providing more steerable and collaborative problem-solving. Beyond the public API, GPT-5.4 is also being integrated into enterprise platforms, with same-day availability announced for Microsoft Foundry and in private preview on Snowflake Cortex AI. This signals a rapid path toward adoption in production environments where reliability and integration with existing data stacks are critical. OpenAI states this is its most factual model yet, with individual claims being 33% less likely to be false and full responses 18% less likely to contain any errors compared to GPT-5.2.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.