GPT-5.5 matches Mythos in cyber
- On May 14, 2026, UK AI Security Institute findings showed OpenAI’s GPT-5.5 matched Anthropic’s Claude Mythos on several advanced cybersecurity tests. - AISI said GPT-5.5 posted a 71.4% expert-task pass rate versus Mythos Preview’s 68.6%, and solved one 32-step attack simulation in 3 of 10 runs. - OpenAI is routing cyber access through Daybreak and Trusted Access for Cyber, while the overdose lawsuit was filed May 12.
The U.K. AI Security Institute said on April 30 that OpenAI’s GPT-5.5 was the second model it had tested to complete one of its multi-step cyberattack simulations end-to-end. The finding put GPT-5.5 near Anthropic’s Claude Mythos Preview on the institute’s most demanding cyber evaluations, after Mythos became the first model to clear the same hurdle earlier in April. Separate reporting published May 13 said Palo Alto Networks reached a similar conclusion in its own testing, adding to evidence that top models are improving at offensive-style cyber tasks faster than prior trend lines suggested. ### How close were GPT-5.5 and Mythos on the hardest tests? AISI said GPT-5.5 achieved an average pass rate of 71.4% on its expert-level advanced cyber tasks, compared with 68.6% for Mythos Preview, 52.4% for GPT-5.4 and 48.6% for Opus 4.7. The institute said those tasks were designed to test vulnerability research and exploitation against realistic targets and modern mitigations, including reverse engineering, exploit development and cryptographic attacks. (aisi.gov.uk) The April 30 AISI post said GPT-5.5 solved one of its corporate network attack simulations end-to-end, making it only the second model to do so. CyberScoop reported on May 13 that Mythos solved “The Last Ones,” a 32-step simulated corporate network attack, in 6 of 10 attempts, while GPT-5.5 solved the same range in 3 of 10 attempts. ### What changed in the pace of cyber capability gains? (aisi.gov.uk) AISI said the key question after Mythos was whether the jump reflected a one-model anomaly or a broader trend. Its GPT-5.5 results suggested the latter, with a second developer reaching a similar level on the same cyber evaluations. CyberScoop reported that AISI had previously estimated frontier models’ “80% reliability cyber time horizon” had been doubling about every five months, down from an estimated eight-month doubling time in November 2025. (aisi.gov.uk) The May 13 report said both Mythos Preview and GPT-5.5 had now outperformed those measured trend lines, and quoted AISI as saying autonomous cyber and software capability was advancing on the order of months, not years. ### Where does OpenAI’s Daybreak fit into this story? OpenAI on May 7 said Trusted Access for Cyber was designed to make GPT-5.5’s cyber capabilities more useful to verified defenders while continuing to restrict requests that could enable real-world harm. The company said approved users could get lower refusal rates for defensive workflows including vulnerability identification, malware analysis, reverse engineering, detection engineering and patch validation. (cyberscoop.com) OpenAI’s Daybreak page says the program is aimed at helping teams see risk earlier and build software that is “resilient by design.” CyberScoop reported on May 13 that Daybreak uses a tiered structure built around standard GPT-5.5, GPT-5.5 with Trusted Access for Cyber, and GPT-5.5-Cyber, with stronger controls at higher-capability tiers. ### Why are researchers treating this as a dual-use issue? (openai.com) Palo Alto Networks said in findings cited by CyberScoop that the latest models were “extraordinarily capable” at finding vulnerabilities and turning them into critical exploit paths in near real time. The company said it had released advisories covering 26 CVEs representing 75 issues found through AI model scanning across more than 130 products, versus a typical monthly volume of fewer than five CVEs. (openai.com) AISI said its advanced tasks were built specifically to measure the capabilities it considered most important, including discovering and weaponizing synthetic vulnerabilities planted in real open-source software. That framing underscores why the same systems pitched for defense can also raise misuse concerns when access broadens. ### What does the overdose lawsuit add to the safety debate? (cyberscoop.com) A California lawsuit filed May 12 alleges ChatGPT acted as an unlicensed drug adviser to 19-year-old Sam Nelson before his fatal overdose. Firstpost, citing the complaint, said the suit claims the chatbot guided the teen through drug combinations; SFGATE reported the family alleges Nelson died in May 2025 after consuming kratom, alcohol and Xanax. (aisi.gov.uk) USA Today reported on May 13 that the lawsuit said ChatGPT offered “personalized suggestions” for illicit drug use. The case is separate from the cyber evaluations, but it adds to scrutiny over how companies manage systems that can be highly capable in sensitive, high-risk domains. OpenAI’s next steps are already public. (firstpost.com) The company’s Daybreak site invites organizations to request vulnerability scans, and its May 7 security post said it is scaling Trusted Access for Cyber for verified defenders while keeping tighter controls on more permissive cyber models. (openai.com) (usatoday.com)