GPT-5.4 'Thinking' Mode Released
OpenAI introduced "Thinking" mode in the latest GPT-5.4 update for ChatGPT, allowing users to interrupt the model mid-reasoning, tweak instructions, or inject new context on the fly. Expanded agent features allow GPT-5.4 to orchestrate tool use, chain tasks, and maintain state over extended sessions, all while offering more deterministic, controllable outputs. The context window makes it possible to process entire codebases or docs interactively.
GPT-5.4, released on March 5, 2026, unifies the capabilities of previous GPT-5 models, incorporating the coding strengths of GPT-5.3 Codex. This eliminates the need for specialized models and introduces native computer use, enabling the AI to control interfaces, websites, and applications. GPT-5.4 can execute tasks by issuing mouse and keyboard commands based on screenshots. GPT-5.4's "Thinking" mode in ChatGPT allows users to monitor and adjust the model's reasoning in real-time. This interactive capability, called "Mid-Response Steering," ensures outputs align with complex, multi-step intents. The model provides upfront plans for intricate queries and allows mid-response adjustments without losing context. The model boasts a context window of up to 1 million tokens in the API and Codex, enabling analysis of entire code repositories, research papers, and legal contracts. The standard context window is 272K tokens, with requests exceeding this limit incurring twice the normal usage rate. GPT-5.4 also introduces a tool search mechanism that cuts token costs by 47% in tool-heavy workflows. GPT-5.4 comes in two tiers: the standard GPT-5.4, named "GPT-5.4 Thinking" in ChatGPT, and GPT-5.4 Pro for users requiring maximum performance. The standard API pricing is $2.50 per million input tokens and $15.00 per million output tokens. GPT-5.4 Pro is priced at $30 per million input tokens and $180 per million output tokens. Benchmark results show significant improvements over previous models. On OSWorld-Verified, which tests desktop navigation, GPT-5.4 achieves a 75.0% success rate, surpassing human performance of 72.4%. It also scores 83% on the GDPval benchmark, measuring performance across different occupations. GPT-5.4 is 33% less likely to contain false claims and 18% less likely to contain factual errors compared to GPT-5.2. GPT-5.4's capabilities extend to coding, matching or surpassing GPT-5.3-Codex on most benchmarks. On SWE-Bench Pro, GPT-5.4 scores 57.7% compared to GPT-5.3-Codex's 56.8%. The model is designed for real-world task execution and excels in multi-step workflows with fewer rounds of human intervention. GPT-5.4 faces competition from models like Anthropic's Claude and Google's Gemini. Some tests reveal limitations in chain-of-thought monitorability, particularly in health queries lacking evidence and impossible tasks. Despite these limitations, OpenAI positions GPT-5.4 as a challenger in document-heavy and analytical domains. GPT-5.2 Thinking will be retired on June 5, 2026, making GPT-5.4 Thinking the intended upgrade path. OpenAI emphasizes that GPT-5.4 is designed for sustained workflows, operating software, and discovering capabilities at runtime. The release reflects a move towards AI agents that perform real tasks and handle complex enterprise workflows.