Mythos cut a 32‑step attack hours
- Anthropic’s Claude Mythos Preview became the first AI model to complete AISI’s 32-step simulated corporate network attack end to end. - In AISI testing, Mythos finished the full chain in 3 of 10 runs and averaged 22 steps; the next-best model averaged 16. - The UK institute said the jump was strongest in multi-step attack simulations, not single tasks. (aisi.gov.uk)
Cybersecurity tests often work like obstacle courses: find a flaw, get in, move sideways, steal credentials, and keep going without breaking the chain. (aisi.gov.uk) The UK AI Security Institute said Anthropic’s Claude Mythos Preview was the first model it tested to finish that kind of course end to end. The course, called “The Last Ones,” is a 32-step simulated corporate network attack. (aisi.gov.uk) AISI said human professionals estimate the same scenario takes about 20 hours. Mythos completed the full chain in 3 of 10 attempts and averaged 22 of 32 steps across all runs. (aisi.gov.uk) The next-best model in the same test, Claude Opus 4.6, averaged 16 steps and never completed the full scenario. AISI said the biggest gain showed up in long, linked attack sequences rather than isolated tasks. (aisi.gov.uk) That distinction matters in plain terms. A model does not need a brand-new exploit if it can reliably string together reconnaissance, privilege escalation, credential theft, and lateral movement faster than a person can. (aisi.gov.uk) (theverge.com) AISI reported that Mythos also solved 73 percent of expert capture-the-flag challenges, which are timed security contests used to test bug-finding and exploitation skills. The institute said models from 2023 could barely complete beginner cyber tasks. (aisi.gov.uk) The Verge tied those results to a broader worry inside security: cheaper offensive capability for less-skilled attackers. Its report said AI systems are getting better at mapping exploit chains across messy, real-world software stacks. (theverge.com) AISI did not describe Mythos as universally superior on every cyber task. It said the standout result was sustained performance across many connected steps inside a controlled environment. (aisi.gov.uk) Anthropic has kept Mythos tightly held rather than broadly releasing it. The test results, and the decision to limit access, put the focus on whether frontier models are crossing from useful assistants into workable autonomous operators. (theverge.com) (malwarebytes.com) The headline number is 32 steps, but the more important one may be 20 hours. That is the amount of human work AISI said the model can now compress inside a single attack chain. (aisi.gov.uk)