GPT‑5.5 cuts hallucinations 52.5%

- OpenAI rolled out GPT‑5.5 in ChatGPT and the API, then expanded Trusted Access for Cyber with a GPT‑5.5‑Cyber variant for vetted defenders. - The clearest number is 52.5%—that’s the drop in hallucinated claims versus GPT‑5.3 Instant on high‑stakes prompts in medicine, law, and finance. - This matters because OpenAI is pairing stronger agentic performance with tighter cyber gating, instead of shipping more capable models with looser access.

OpenAI’s latest model story is really two stories welded together. One is about reliability — fewer made-up claims, better tool use, and stronger performance on messy multi-step work. The other is about containment — especially in cybersecurity, where more capable models can help defenders but can also lower the cost of misuse. GPT‑5.5 is the product version of that tradeoff. GPT‑5.5‑Cyber is the policy version. ### What actually launched? OpenAI introduced GPT‑5.5 on April 23, 2026, calling it its smartest general model so far, with rollout to ChatGPT Plus, Pro, Business, and Enterprise, plus API access for GPT‑5.5 and GPT‑5.5 Pro starting April 24. The pitch is not just “better answers.” It’s that the model can take fuzzier instructions, plan across tools, check its own work, and keep moving until a task is finished. Basically — more agent, less autocomplete. (openai.com) ### Where does the 52.5% number come from? That number comes from OpenAI’s GPT‑5.5 Instant update on May 5, 2026. In OpenAI’s internal evaluations, GPT‑5.5 Instant produced 52.5% fewer hallucinated claims than GPT‑5.3 Instant on high-stakes prompts in medicine, law, and finance. It also cut inaccurate claims by 37.3% on especially difficult conversations that users had previously flagged for factual errors. So the headline is not “hallucinations are solved.” It’s that the day-to-day default model got noticeably more dependable where mistakes hurt most. (openai.com) ### Why is “agentic” the bigger deal? A chatbot that answers cleanly is useful. A model that can navigate software, search, write code, analyze data, and finish a job is more like a junior operator. That’s the jump OpenAI is chasing with GPT‑5.5. The model card says gains are especially strong in agentic coding, computer use, knowledge work, and early scientific research, while keeping latency around GPT‑5.4 levels and using fewer tokens on some coding tasks. The point is not just intelligence in the abstract — it’s whether the model can hold onto a goal long enough to deliver an outcome. (openai.com) ### So why bundle this with cyber controls? Because cybersecurity is where “helpful” and “dangerous” sit right next to each other. OpenAI’s Trusted Access for Cyber program, launched in February, was built to give verified defenders more room to do legitimate work while keeping baseline safeguards in place for everyone else. In April, OpenAI started scaling that program with GPT‑5.4‑Cyber. In May, it extended the framework to GPT‑5.5 and GPT‑5.5‑Cyber. More capability, but with narrower doors. (openai.com) ### What is GPT‑5.5‑Cyber, exactly? It’s not a public “uncensored cyber model.” It’s a variant meant for vetted defensive users inside Trusted Access for Cyber. OpenAI says approved defenders get lower classifier-based refusals for workflows like vulnerability identification, malware analysis, reverse engineering, detection engineering, and patch validation — while harmful requests are still supposed to stay blocked. That makes the model more usable for real security teams, but only after identity and trust checks. (openai.com) ### What’s the catch? The catch is that every gain here cuts both ways. Better planning, better tool use, and fewer factual mistakes make a model more valuable for normal work — but they also make it more capable in sensitive domains. OpenAI’s answer is stronger safeguards, external red-teaming, and gated access for the sharpest cyber workflows. Whether that balance holds is the real test, not the benchmark graph. (openai.com) ### Bottom line? GPT‑5.5 matters less because of one flashy benchmark and more because it shows the new shipping pattern. OpenAI is trying to make its default models more factual and more agentic, while moving the riskiest capabilities behind identity-based access. If that works, you get a model that is both more useful and less reckless. If it doesn’t, the same improvements that make AI feel reliable will make the mistakes — and misuse — more consequential. (openai.com 1) (openai.com 2)

GPT‑5.5 cuts hallucinations 52.5%

Get your own daily briefing