OpenAI pushes GPT-5.5 autonomy

- OpenAI said on April 23 it is rolling out GPT-5.5 in ChatGPT and Codex, pitching the model as software that can finish multi-step work. - OpenAI says GPT-5.5 scored 82.7% on Terminal-Bench 2.0 versus GPT-5.4 at 75.1%, while matching GPT-5.4 per-token latency in real-world serving. - The release came six weeks after GPT-5.4, sharpening the race with Anthropic and Google around agentic AI. (openai.com)

OpenAI on April 23 released GPT-5.5, a new model built to carry out multi-step work on a computer with less user supervision. (openai.com) (cnbc.com) The company said GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. OpenAI added on April 24 that GPT-5.5 and GPT-5.5 Pro were also available in the application programming interface, or API. (openai.com) OpenAI described the model as software that can take a rough request, plan steps, use tools, check results, and keep going until a task is finished. It said the model is aimed at coding, web research, data analysis, documents, spreadsheets, and software operation. (openai.com 1) (openai.com 2) This is the core idea behind “agentic” artificial intelligence: not just answering a prompt, but acting across tools the way a worker moves between a browser, a terminal, and office software. OpenAI said GPT-5.5 is its first fully retrained base model since GPT-4.5 and the next step after GPT-5.4’s earlier computer-use features. (openai.com) (azure.microsoft.com) OpenAI’s own benchmark table showed GPT-5.5 at 82.7% on Terminal-Bench 2.0, up from 75.1% for GPT-5.4, and 78.7% on OSWorld-Verified, up from 75.0%. It also posted 84.9% on GDPval wins or ties, compared with 83.0% for GPT-5.4. (openai.com) OpenAI also compared GPT-5.5 with Anthropic’s Claude Opus 4.7 and Google’s Gemini 3.1 Pro on several tests. In OpenAI’s chart, GPT-5.5 led those rivals on Terminal-Bench 2.0, GDPval, FrontierMath Tier 1–3, FrontierMath Tier 4, and CyberGym, while trailing Gemini 3.1 Pro on BrowseComp. (openai.com) The company said GPT-5.5 matches GPT-5.4’s per-token latency despite higher capability and uses fewer tokens on the same Codex tasks. Microsoft said the model would become generally available in Microsoft Foundry for enterprise customers building agents. (openai.com) (azure.microsoft.com) The launch landed less than two months after GPT-5.4, according to CNBC, and days after Anthropic’s Claude Mythos Preview drew attention for advanced cybersecurity performance. OpenAI President Greg Brockman said GPT-5.5 can take “less guidance” and figure out what should happen next. (cnbc.com) OpenAI’s system card said GPT-5.5 was tested under its Preparedness Framework, including targeted red-teaming for advanced cybersecurity and biology risks, with feedback from nearly 200 early-access partners. CNBC reported OpenAI classified the model as “High” risk for cybersecurity, below its “Critical” threshold. (openai.com) (cnbc.com) The immediate contest is shifting from chatbot polish to whether models can complete real office and engineering work from vague instructions. GPT-5.5 is OpenAI’s latest bid to make that behavior a product, not a demo. (openai.com) (azure.microsoft.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.