OpenAI Drops GPT-5.4

Just two days after launching GPT-5.3 Instant, OpenAI has released GPT-5.4, a flagship model aimed at enterprise work. The new version features native computer-use capabilities, allowing it to operate software and manage files autonomously. It also boasts a massive one-million-token context window, setting a new bar in the race for 'agentic' AI.

The release of GPT-5.4 is part of a strategic acceleration by OpenAI, which appears to be moving towards monthly model updates. This rapid iteration, including the recent launches of GPT-5.3 Codex for programming and GPT-5.3 Instant for conversations, aims to prevent the inflated expectations that surrounded the initial GPT-5 launch. A key technical innovation accompanying GPT-5.4 is "Tool Search" in the API. Instead of loading all tool definitions into a prompt, the model can now search for them on an as-needed basis, a change that cut token consumption by 47% in one benchmark test. This efficiency is designed to offset higher token prices for the more capable model. On the OSWorld-Verified benchmark, which measures an AI's ability to navigate a real desktop environment, GPT-5.4 scored 75.0%, surpassing the human baseline of 72.4% for the first time. This leap in performance, up from GPT-5.2's 47.3% score, is a significant step towards agents that can autonomously operate software for complex tasks. The push into enterprise-level "agentic AI" places OpenAI in more direct competition with Anthropic and Google. Anthropic's Claude for Financial Services and Google's Gemini Enterprise platform have been specifically building towards integrated, agent-based workflows for some time. This move signals a market shift from standalone chatbots to AI systems deeply embedded in business processes. To bolster its enterprise push, OpenAI has bundled the GPT-5.4 release with a beta version of ChatGPT for Excel and new financial data integrations. The model has shown significant improvement on tasks like financial modeling, where it scored 87.5% on a benchmark for junior investment banking analysts, compared to 68.4% for GPT-5.2. The one-million-token context window is a significant jump from the 400,000-token limit of its predecessor and catches OpenAI up with competitors like Google's Gemini 1.5 Pro. This massive context allows the model to process and "remember" extensive documents, such as entire codebases or research papers, in a single interaction. OpenAI claims GPT-5.4 is 18% less likely to produce a response containing any error and 33% less likely to make false individual claims compared to GPT-5.2. For the first time, the model's safety documentation also includes mitigations for “High capability in Cybersecurity,” a new classification for a general-purpose model from the company.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.