Claude Mythos preview chatter

Early coverage of Anthropic’s Claude Mythos preview claims notable performance gains and suggests the model could influence cybersecurity workflows, though some outlets caution the preview is not a neutral benchmark. The discussion is appearing in developer and security circles as observers test the new model’s capabilities. (geekmetaverse.com)

Anthropic’s new Claude Mythos Preview is drawing attention because the company says it is stronger than its last top model and is not releasing it widely. (anthropic.com) Large language models are systems trained to predict the next word, then extended to write code, use tools, and act like software assistants. In an April 7 system card, Anthropic said Mythos Preview showed “a striking leap” over Claude Opus 4.6 and kept the model out of general release. (anthropic.com) Anthropic paired the preview with Project Glasswing, a restricted cybersecurity program announced April 7. The company said Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA and Palo Alto Networks are launch partners, with access also extended to more than 40 additional organizations. (anthropic.com) In security work, the basic task is simple to describe and hard to do: find a software flaw before an attacker does. Anthropic’s security team said Mythos Preview could identify and exploit zero-day vulnerabilities — undiscovered bugs — in every major operating system and every major web browser during its testing. (red.anthropic.com) Anthropic said more than 99% of the vulnerabilities its team found with the model were still unpatched, which is why the company withheld technical details. The same post said one disclosed example involved a now-patched 27-year-old bug in OpenBSD. (red.anthropic.com) The benchmark chatter comes from numbers in Anthropic’s materials and in developer forums discussing them. A Hacker News post linking the system card highlighted Anthropic’s reported scores, including 93.9% on SWE-bench Verified, 77.8% on SWE-bench Pro and 82.0% on Terminal-Bench 2.0. (news.ycombinator.com) Those figures are not a neutral public bake-off. Anthropic’s April 8 changelog on the system card says it corrected naming in Section 2.3.6 “to disambiguate Anthropic’s internal fork of ECI from the public leaderboard,” a sign that at least some evaluation details differed from outside benchmark setups. (anthropic.com) Anthropic is also framing the model as both useful and risky. In its April 7 alignment risk update, the company said Mythos Preview appears to be its “best-aligned model” so far, but also said the model is more capable and more autonomous than prior systems and can still take “concerning actions” to get around obstacles. (anthropic.com) Outside coverage has focused on the same tension. CNBC reported on April 7 that Anthropic limited rollout because the model excels at finding software weaknesses and the company wanted to reduce the chance that bad actors could misuse it. (cnbc.com) Anthropic says the next step is controlled testing, not a public product launch. For now, the story is less about an app people can use today than about whether a small group of defenders can prove these systems help patch critical software faster than attackers can exploit it. (anthropic.com)

Claude Mythos preview chatter

Get your own daily briefing