Microsoft MDASH finds 16 Windows flaws
- Microsoft unveiled MDASH, an AI system of more than 100 agents that it says found 16 previously unknown Windows vulnerabilities and outperformed rivals in testing. - Britain’s AI Security Institute reported OpenAI’s GPT‑5.5 and Anthropic’s Mythos both advanced well above prior trends at finding software vulnerabilities in cybersecurity testing. - The technical gains are pushing regulators to demand model inspections and controlled access; the EU confirmed OpenAI offered a cybersecurity model for evaluation while Anthropic has not shared Mythos. (firstpost.com) (theverge.com) (cryptobriefing.com)
Microsoft’s disclosure this week is the clearest sign yet that frontier AI labs are no longer talking about cyber capability in the abstract. On May 12, Microsoft said its new multi-model agentic scanning harness, MDASH, helped researchers find 16 previously unknown vulnerabilities in the Windows networking and authentication stack, including four critical remote-code-execution flaws. The company said the system uses more than 100 specialized AI agents and is already being used by Microsoft security engineering teams, with a limited private preview for some customers. (microsoft.com) The headline number matters, but the testing detail matters more. Microsoft said MDASH found 21 of 21 planted vulnerabilities with zero false positives on a private test driver, reached 96% recall against five years of confirmed Microsoft Security Response Center cases in `clfs.sys`, 100% in `tcpip.sys`, and scored 88.45% on the public CyberGym benchmark of 1,507 real-world vulnerabilities. Microsoft’s Taesoo Kim, vice president for agentic security, wrote that the system was built by the company’s Autonomous Code Security team and was designed to “discover, debate, and prove exploitable bugs end-to-end.” (microsoft.com) That puts Microsoft’s announcement into a broader pattern, not a one-off. Britain’s AI Security Institute said on April 30 that OpenAI’s GPT-5.5 was “one of the strongest models” it had tested on cyber tasks and the second model to complete one of its multi-step cyber-attack simulations end-to-end. The institute said an earlier snapshot of Anthropic’s Claude Mythos Preview had already marked a step up over prior frontier models, and GPT-5.5 reached a similar level, suggesting the gains were part of a broader trend rather than unique to one developer. (aisi.gov.uk) The AISI numbers are specific enough to show why governments are paying attention. On expert-level tasks in AISI’s advanced cyber suite, GPT-5.5 posted an average pass rate of 71.4%, compared with 68.6% for Mythos Preview, 52.4% for GPT-5.4 and 48.6% for Opus 4.7, the institute said. Those tasks were built to test vulnerability research and exploitation skills such as reverse engineering, web exploitation and cryptography, and AISI said its basic tasks had been fully saturated by models since at least February 2026. (aisi.gov.uk) That is where the regulatory story starts to merge with the technical one. On May 11, OpenAI said it would give the European Union access to GPT-5.5-Cyber, a cybersecurity-focused variant of its latest model, in limited preview for vetted teams. European Commission spokesperson Thomas Regnier said the offer would let the bloc “follow deployment of the model very closely” and address security concerns, while adding that talks with Anthropic over access to Mythos were at a “different stage.” (cnbc.com) The contrast is not just about who has the strongest model. It is about who is willing to let outside authorities inspect one. CNBC reported that Regnier said the Commission had held “four or five” meetings with Anthropic but had not yet secured preview access to Mythos. OpenAI’s George Osborne said in a statement that “trusted partners” should help govern cyber safety and that the company’s EU Cyber Action Plan would extend defensive tools to European governments, institutions and businesses. (cnbc.com) The next phase is now concrete. Microsoft said MDASH is in use internally and available in a limited private preview, while the European Commission said further discussions with OpenAI were planned this week over access to GPT-5.5-Cyber. Anthropic’s next move on Mythos access will determine whether European regulators can compare the leading systems on equal terms. (microsoft.com)