OpenAI GPT-5.5 posts 73% dev success

- OpenAI said on April 23 it began rolling out GPT-5.5 in ChatGPT and Codex, positioning the model as a system for coding, research, spreadsheets and software tasks that carry themselves forward. - OpenAI’s launch post says GPT-5.5 scored 73.1% on its internal Expert-SWE benchmark and 82.7% on Terminal-Bench 2.0, while matching GPT-5.4’s per-token latency in real-world serving. - The release extends OpenAI’s shift from chat replies to tool-using agents that plan, verify and keep working across long tasks. (openai.com)

Software agents are built to keep working after you stop typing, and OpenAI says GPT-5.5 is its clearest step in that direction. (openai.com) OpenAI released GPT-5.5 on April 23, 2026, and said it is rolling out to Plus, Pro, Business and Enterprise users in ChatGPT and Codex. An April 24 update added that GPT-5.5 and GPT-5.5 Pro are also available in the application programming interface, or API. (openai.com) The company said GPT-5.5 is built for “complex, real-world work,” including writing and debugging code, researching online, analyzing data, creating documents and spreadsheets, and operating software across tools until a task is finished. (openai.com 1) (openai.com 2) On OpenAI’s published benchmarks, GPT-5.5 posted 73.1% on Expert-SWE, 82.7% on Terminal-Bench 2.0 and 78.7% on OSWorld-Verified. OpenAI also said it matched GPT-5.4’s per-token latency while using fewer tokens on the same Codex tasks. (openai.com) That framing is different from a standard chatbot launch. OpenAI described GPT-5.5 as a model that can “plan, use tools, check its work, navigate through ambiguity, and keep going,” with gains concentrated in agentic coding, computer use, knowledge work and early scientific research. (openai.com) The basic idea behind long-horizon agents is simple: instead of answering once, the model runs a loop of planning, editing, testing, seeing what broke and trying again. OpenAI’s Codex team said that loop matters more than “one giant prompt” because it gives the model feedback from files, logs, tests and diffs while it works. (developers.openai.com) OpenAI had already been pushing Codex in that direction before this release. In September 2025, it said GPT-5-Codex was trained for software engineering jobs such as building projects from scratch, adding tests, debugging, refactoring large codebases and reviewing code. (openai.com) In a February 23, 2026 post, OpenAI developer Derrick Choi described a stress test in which GPT-5.3-Codex ran for about 25 hours, used about 13 million tokens and generated about 30,000 lines of code in a blank repository. He called it an experiment, not a production rollout, but used it to argue that the practical shift is “time horizon.” (developers.openai.com) OpenAI’s GPT-5.5 system card says the company gathered feedback from nearly 200 early-access partners and added targeted red-teaming for advanced cybersecurity and biology capabilities before release. The card was updated on April 24 with additional information about safeguards for API deployment. (openai.com) The release leaves one clear message for developers: OpenAI wants Codex and ChatGPT to act less like answer engines and more like workers that stay on task. GPT-5.5 is the first launch where the company’s headline numbers and product language are both centered on that claim. (openai.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.