GPT‑5.5 debuts, pitched as much stronger at multi‑step coding and agent workflows

- OpenAI said on April 23 it released GPT‑5.5, a new flagship model for ChatGPT and Codex that it says is better at coding, computer use, online research, data analysis, and longer multi-step tasks. - OpenAI said GPT‑5.5 matches GPT‑5.4’s per-token latency while scoring 82.7% on Terminal-Bench 2.0 versus 75.1% for GPT‑5.4, with GPT‑5.5 Pro added to the API on April 24. - The launch came less than two months after GPT‑5.4 and amid a faster race with Anthropic and Google over models that can plan, use tools, and finish work with less supervision. (cnbc.com)

Large language models started as autocomplete systems for text. OpenAI says GPT‑5.5 is built to do more of the job itself: plan steps, use tools, check results, and keep going until a task is finished. (openai.com) OpenAI announced GPT‑5.5 on April 23, 2026, and said the model is rolling out in ChatGPT and Codex for Plus, Pro, Business, and Enterprise users. The company updated the launch on April 24 to say GPT‑5.5 and GPT‑5.5 Pro were also available in the application programming interface, or API. (openai.com) The company described GPT‑5.5 as its “smartest and most intuitive to use model yet,” with gains in writing and debugging code, researching online, analyzing data, creating documents and spreadsheets, and operating software across multiple tools. OpenAI said users can hand it a “messy, multi-part task” with less step-by-step prompting than before. (openai.com) In plain terms, agent workflows are jobs that unfold over time instead of in one answer. A coding agent might inspect a repository, run commands, fix a bug, test the patch, and revise the result after checking its own work. (openai.com 1) (openai.com 2) OpenAI’s benchmark table is meant to show that shift. It reported GPT‑5.5 at 82.7% on Terminal-Bench 2.0, up from 75.1% for GPT‑5.4, 78.7% on OSWorld-Verified versus 75.0%, and 84.4% on BrowseComp versus 82.7%. (openai.com) OpenAI also said GPT‑5.5 matched GPT‑5.4 on per-token latency in real-world serving while using “significantly fewer tokens” to finish the same Codex tasks. That matters because agent systems can become expensive or sluggish if each step takes too long or burns too much context. (openai.com) The release lands just weeks after OpenAI’s March 5 launch of GPT‑5.4, which the company had already pitched as a model for reasoning, coding, and agentic workflows. CNBC reported the GPT‑5.5 launch came less than two months later, underscoring how quickly major labs are iterating. (openai.com) (cnbc.com) OpenAI President Greg Brockman told reporters the defining change was how much the model could do “with less guidance.” CNBC reported he said GPT‑5.5 can take an unclear problem and decide what needs to happen next. (cnbc.com) That pitch puts GPT‑5.5 in the middle of a competition over AI systems that act more like junior operators than chatbots. CNBC said the release followed Anthropic’s Claude Mythos Preview, while OpenAI framed GPT‑5.5 as part of “a new way of getting work done on a computer.” (cnbc.com) (openai.com) OpenAI paired the launch with a new system card and said GPT‑5.5 went through its predeployment safety evaluations, targeted red-teaming for advanced cybersecurity and biology risks, and testing with nearly 200 early-access partners. The company said it was releasing the model with its “strongest set of safeguards to date.” (openai.com 1) (openai.com 2) CNBC reported OpenAI said GPT‑5.5 did not cross the company’s “Critical” cybersecurity threshold but did meet its “High” risk classification, which OpenAI defines as the potential to amplify existing pathways to severe harm. OpenAI updated its system card on April 24 with added information about API safeguards for GPT‑5.5 and GPT‑5.5 Pro. (cnbc.com) (openai.com) The practical shift is less about one-shot answers than about supervision. As these models get better at decomposing tasks, using software, and checking outputs, more of the human work moves to setting constraints, reviewing results, and deciding when the machine should stop. (openai.com) (cnbc.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.