OpenAI Ships GPT-5.4 with 1M Token Window

OpenAI just released GPT-5.4, a massive upgrade aimed at enterprise and agentic AI. The model features a 1 million token context window and, for the first time, can natively use a computer to control apps and manage files. It's a significant step toward autonomous agents, with reworked tool-calling and new benchmarks in professional reasoning.

The release of GPT-5.4 is a direct challenge to competitors like Anthropic's Claude and Google's Gemini, particularly in the enterprise space. The model's focus on professional workflows, coding, and tasks involving documents and presentations suggests OpenAI is targeting high-value business applications. Pricing for the new model is higher per token than its predecessor, GPT-5.2, but OpenAI claims the increased token efficiency, especially in tool-heavy scenarios, may lead to lower overall costs for some workloads. A significant technical leap is the introduction of "tool search" in the API, which allows the model to dynamically find and use tools instead of requiring all definitions upfront. This feature dramatically reduces the number of tokens needed for complex agentic tasks, with OpenAI reporting a 47% token reduction in tests using the MCP Atlas benchmark. The 1 million token context window is an experimental, opt-in feature that doubles the usage cost for requests exceeding 272,000 tokens. On the OSWorld-Verified benchmark, which measures an AI's ability to navigate a desktop environment, GPT-5.4 achieved a 75% success rate, surpassing the human baseline of 72.4%. This capability allows the model to interact with websites and software by writing code or executing mouse and keyboard commands based on screenshots, a key component for building more autonomous agents. For developers, GPT-5.4 integrates the advanced coding capabilities of GPT-5.3-Codex into the main model. It shows improved performance on benchmarks like SWE-Bench Pro and introduces a "/fast" mode in Codex for increased speed. The model is also designed to be more efficient in multi-step agentic workflows, often completing tasks with fewer tokens and tool calls. The model introduces an "extreme" reasoning mode, allowing it to dedicate more computational resources to difficult problems, which is particularly useful for complex, long-running tasks. In ChatGPT, a new feature allows users to see and adjust the model's work plan mid-response, offering more control over the final output. This is part of a broader strategy from OpenAI to release more frequent, iterative model updates. While direct comparisons on all industry benchmarks are still emerging, GPT-5.4 shows significant gains on specific evaluations. On the GDPval benchmark, which assesses performance on professional work tasks, it scored 83%, indicating it meets or exceeds expert performance in a majority of cases. However, in some areas, like certain cybersecurity scenarios and health evaluations, it showed mixed results compared to previous versions. OpenAI is also pushing GPT-5.4 into specific enterprise verticals, notably finance. The release was paired with a beta of ChatGPT for Excel and new integrations with financial data providers like Moody's and Dow Jones Factiva. On an internal benchmark for tasks a junior investment banking analyst might perform, GPT-5.4 scored 87.3%, a substantial improvement over GPT-5.2's 68.4%. The model is rolling out to paid users of ChatGPT (Plus, Team, and Pro) and is available through the API and Codex. GPT-5.2 will be phased out for paid users over the next few months. This rapid release cycle, with GPT-5.3 Instant having been released just days prior, signals an acceleration in OpenAI's development and deployment strategy.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.