OpenAI Releases GPT-5.4 with 1M Token Context

OpenAI has formally released GPT-5.4, a major leap forward that consolidates features from prior versions and introduces a massive one-million-token context window. This allows the model to process entire codebases or large document sets in a single pass, a potential game-changer for code analysis and complex workflows. Along with Gemini 3.1 and Claude 4.6, it now defines the new frontier of AI models.

The rapid succession of releases—GPT-5.1 in November, 5.2 in December, 5.3-Codex in February, and now 5.4 in March—signals a strategic shift for OpenAI toward more frequent, iterative updates. This faster cadence reflects a broader industry trend where the massive performance jumps between foundational models, like from GPT-3 to GPT-4, are slowing. GPT-5.4's one-million-token context window is a direct response to similar offerings from competitors, including Google's Gemini 3.1 Pro and Anthropic's Claude 4.6. While a massive leap from the 400,000 tokens in GPT-5.3, this expansion also introduces challenges like the "lost in the middle" problem, where models struggle to recall information buried deep within a long input. For developers, this larger context simplifies workflows by potentially reducing the need for complex retrieval-augmented generation (RAG) systems. However, the computational cost is significant; processing power requirements can scale quadratically with the input length, making token efficiency crucial. GPT-5.4's API pricing is set at $2.50 per million input tokens and $15 for output, a notable increase from previous versions, though OpenAI claims it's more cost-effective per task. This release also unifies OpenAI's general-purpose and coding-specific models, integrating the "Codex" line directly into the main GPT-5.4 offering. It introduces an "Upfront Planning" feature that outlines its reasoning before generating a complex response, allowing users to intervene and steer the model mid-task. This is part of a larger focus on "agentic" workflows, enabling the model to handle multi-step tasks autonomously. While GPT-5.4 shows strong performance, it doesn't dominate across all benchmarks. Anthropic's Claude 4.6 still holds a lead in some complex coding tasks on benchmarks like SWE-Bench Verified. This specialization suggests a future where engineering teams may not choose a single "best" model, but instead build systems that route specific tasks to the most suitable AI. The model's API is available in Standard, Thinking, and Pro variants, with the Pro version using more computation time for more accurate responses to difficult queries. A new feature also allows the model to control a virtual mouse and keyboard, enabling it to interact with software applications directly to complete tasks. For engineers at startups, the key tradeoff remains balancing cutting-edge capabilities with cost. While a million-token context can unlock powerful new applications, like analyzing an entire codebase for security vulnerabilities, the associated costs and potential for latency require careful consideration. The decision is less about adopting the newest model and more about architecting systems that can leverage the best tool for the job.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.