Microsoft's new AI beats Mythos

- Microsoft said on May 12 that its new MDASH security system topped the CyberGym benchmark, while YouTube creators amplified the result this week. - Microsoft reported an 88.45% CyberGym score for MDASH, about five points ahead of the next entry, and said it beat Anthropic’s Mythos. - Microsoft is offering MDASH in limited private preview, while Anthropic continues gating Mythos Preview through Project Glasswing and Microsoft Foundry.

Microsoft’s claim that a new in-house AI security system “beats Mythos” traces back to a company blog post and follow-on YouTube videos, not to a new standalone consumer model launch. Microsoft said on May 12 that its multi-model agentic scanning system, codenamed MDASH, topped the public CyberGym cybersecurity benchmark with an 88.45% score. In the days that followed, English- and Spanish-language YouTube channels posted videos with titles including “Microsoft’s New AI Beats Mythos And Shocks OpenAI” and “La nueva IA de Microsoft vence a Mythos y sorprende a OpenAI.” The videos matched Microsoft’s benchmark claim, but the public material they cited did not include full transcript-level evidence or raw comparison data beyond the company and benchmark summaries. ### What, exactly, did Microsoft announce? Microsoft said MDASH is a “multi-model agentic scanning harness” built by its Autonomous Code Security team, not a single frontier model positioned as a direct replacement for Copilot or ChatGPT-style assistants. The company said the system orchestrates more than 100 specialized AI agents across frontier and distilled models to find and validate exploitable software bugs. (microsoft.com) The May 12 Microsoft Security Blog post said MDASH helped researchers find 16 new vulnerabilities in the Windows networking and authentication stack, including four critical remote-code-execution flaws. Microsoft also said the system is being used internally by its security engineering teams and tested with a small set of customers in a limited private preview. (microsoft.com) ### Where did the “beats Mythos” line come from? Microsoft’s own materials made the comparison explicit. A Microsoft regional news post dated May 14 said CyberGym had announced that MDASH now led its sector ranking, “surpassing Mythos,” and described MDASH as the first multi-model system included in that benchmark. (microsoft.com) The English-language YouTube video published May 15 repeated that claim in its description, saying Microsoft had revealed MDASH and that it beat Anthropic’s Mythos Preview and OpenAI’s GPT-5.5 on CyberGym. The Spanish-language version used similar wording, saying Microsoft’s “MDash” beat Anthropic and OpenAI using their own models. ### Is Mythos an Anthropic model, and was Microsoft already using it? (news.microsoft.com) Anthropic introduced Claude Mythos Preview on April 7 as a gated research preview for defensive cybersecurity work, saying it would not be made generally available. Anthropic said launch partners in Project Glasswing included Microsoft, Amazon Web Services, Apple, Google, Nvidia and others. Microsoft had already said on April 7 that Anthropic provided it early access to Claude Mythos Preview, and on April 22 Microsoft said it was working with Anthropic through Project Glasswing to test Mythos on security tasks. (youtube.com) Microsoft also offers Claude Mythos Preview through Microsoft Foundry as a gated research preview, with access prioritized for defensive cybersecurity use cases and granted at Anthropic’s discretion. (anthropic.com) ### What public evidence supports Microsoft’s benchmark claim? Microsoft’s May 12 post gave the most concrete public numbers. The company said MDASH scored 88.45% on CyberGym’s set of 1,507 real-world vulnerabilities, about five points ahead of the next entry, and reported 21 of 21 planted vulnerabilities found with zero false positives on a private test driver. It also reported 96% recall against five years of confirmed Microsoft Security Response Center cases in clfs.sys and 100% in tcpip.sys. (microsoft.com) Anthropic’s Mythos system card and related security material also reference CyberGym, but Anthropic has said Mythos Preview is strong enough that some older benchmarks are close to saturation and that many vulnerability details cannot yet be disclosed because more than 99% of the issues it found remain unpatched. That limits public side-by-side verification outside the summary figures released by the companies and benchmark operators. (microsoft.com) ### Why do the videos overstate what was shown? The YouTube uploads described MDASH as “Microsoft’s new AI” and framed the result as a direct defeat of Anthropic and OpenAI. Microsoft’s published description is narrower: MDASH is a security system that combines many agents and models for vulnerability research, and Microsoft has separately said its security strategy is intentionally multi-model. (www-cdn.anthropic.com) OpenAI was named in the video titles and descriptions, but the official Microsoft post highlighted CyberGym leadership and did not present a consumer-facing benchmark showdown. The available public record supports this much: Microsoft disclosed a benchmark result for MDASH on May 12, YouTube creators repackaged that result on May 15 and May 17, and both Mythos Preview and MDASH remain tied to restricted cybersecurity programs rather than open public release. (youtube.com) (microsoft.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.