GPT-5.5 matches Mythos in cyber tests

- OpenAI and Anthropic were found in May 2026 testing to perform at similar levels on advanced software vulnerability tasks, according to UK evaluators. - AISI said GPT-5.5 scored 71.4% on expert cyber tasks versus 68.6% for Claude Mythos Preview, a difference within error bars. - On June 1, 2026, OpenAI will require Advanced Account Security for users accessing its most permissive cyber models.

OpenAI’s GPT-5.5 has reached roughly the same level as Anthropic’s Claude Mythos Preview on advanced cybersecurity evaluations, according to results published by the U.K. AI Security Institute on April 30. The institute said GPT-5.5 was the second model it had seen complete one of its multi-step cyberattack simulations end-to-end, after Mythos. Separate reporting published May 13 by CyberScoop said Palo Alto Networks found a similar acceleration in model performance during its own testing. The result narrows what had looked, just weeks earlier, like a lead for Anthropic’s restricted cyber model. ### How close were GPT-5.5 and Mythos in the U.K. tests? AISI said GPT-5.5 achieved an average pass rate of 71.4% on its expert-level advanced cyber tasks, compared with 68.6% for Claude Mythos Preview. The same chart put OpenAI’s earlier GPT-5.4 at 52.4% and Anthropic’s Claude Opus 4.7 at 48.6%, using a 50 million-token budget. AISI said the gap between GPT-5.5 and Mythos was small enough that GPT-5.5 “reaches a similar level of performance” and “may be the strongest model” it has tested on that measure. (aisi.gov.uk) The April 30 AISI post said its advanced suite covers 48 tasks across practitioner and expert levels, aimed at vulnerability research and exploitation against realistic targets and modern mitigations. The institute said those tasks include reverse engineering, web exploitation, cryptography, heap overflows, type confusions and synthetic vulnerabilities inserted into real open-source software. (aisi.gov.uk) ### What changed in the longer attack simulations? AISI said Anthropic’s Mythos was the first model to complete its simulated corporate network attack called “The Last Ones” end-to-end, a 32-step exercise it estimates would take a human about 20 hours. CyberScoop reported on May 13 that a newer Mythos checkpoint solved that range in 6 of 10 attempts and also completed “Cooling Tower,” another range that had previously gone unsolved, in 3 of 10 attempts. (aisi.gov.uk) The same report said GPT-5.5 solved “The Last Ones” in 3 of 10 attempts. CyberScoop said AISI had been tracking an “80% reliability cyber time horizon” that had been doubling about every five months earlier this year, down from an eight-month estimate in November 2025. The May 13 report said Mythos and GPT-5.5 had moved beyond those trend lines, though AISI said it was still unclear whether that represented a one-time jump or a faster new trajectory. (aisi.gov.uk) ### What did Palo Alto Networks say it saw? Palo Alto Networks said the latest models were “extraordinarily capable at finding vulnerabilities and changing them into critical exploit paths in near-real-time,” according to CyberScoop’s summary of the company’s findings. The company said it began testing Claude Mythos in April as a launch partner for Anthropic’s Project Glasswing and later tested Claude Opus 4.7 and OpenAI’s GPT-5.5-Cyber through OpenAI’s Trusted Access for Cyber program. (cyberscoop.com) CyberScoop reported that Palo Alto released advisories covering 26 CVEs representing 75 issues found through AI model scanning across more than 130 products, compared with a typical monthly volume of fewer than five CVEs. That figure is one of the clearest external signs that vendors are moving from benchmark claims to production-style vulnerability hunting. (cyberscoop.com) ### How is OpenAI trying to turn that capability into a product? OpenAI said on May 7 that it was expanding Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber, with GPT-5.5-Cyber entering limited preview for defenders responsible for critical infrastructure. The company said verified users get lower refusal rates for authorized tasks such as vulnerability identification, malware analysis, reverse engineering, detection engineering and patch validation, while safeguards still block credential theft, stealth, persistence, malware deployment and exploitation of third-party systems. (cyberscoop.com) CyberScoop reported on May 13 that OpenAI had wrapped those model tiers into a broader initiative called Daybreak. The publication said Daybreak combines GPT-5.5, GPT-5.5 with Trusted Access for Cyber, GPT-5.5-Cyber and OpenAI’s Codex framework to help organizations identify, patch and validate vulnerabilities across the software lifecycle. (openai.com) ### Why does access matter as much as the benchmark? Anthropic said in its Mythos technical write-up that it viewed the model as a “watershed moment” for security and had chosen a coordinated effort to reinforce cyber defenses, rather than broad release. CyberScoop reported that Anthropic has kept Mythos tightly restricted and not made it commercially available, citing safety and national security concerns. OpenAI, by contrast, said its approach is to widen defensive access in tiers, with identity checks and graduated safeguards. (cyberscoop.com) OpenAI said individual members of Trusted Access for Cyber using its most cyber-capable and permissive models must enable Advanced Account Security beginning June 1, 2026. That requirement is the next concrete checkpoint for the company’s Daybreak and Trusted Access rollout. (openai.com) (red.anthropic.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.