Anthropic: tools and safety

Anthropic has broadened its Claude platform with new tool helpers, an advisor product to reduce operational costs, and documentation for reusable ‘Agent Skills’ — but it also pulled the public API launch of a powerful model (Claude Mythos) after safety testing flagged exploit risks. The company is emphasizing both richer developer tooling and tighter controls around tool invocation and code-executing skills. (Claude Platform - Release Notes, Agent Skills - Claude API Docs, Anthropic launches advisor tool for Claude API users, (spiceworks.com))

Anthropic spent this week doing two opposite things at once: it added more ways for Claude to act like a working software agent, and it kept one of its strongest new models behind a locked door after safety tests found it could uncover dangerous software flaws. A software agent is just a language model that does jobs instead of only chatting. It reads instructions, decides when to use a tool, runs steps in order, and hands back a result, like an intern who can open a browser, write code, and fill in a spreadsheet. The risky part is the tool. Anthropic’s docs split tools into two buckets: client tools run in the developer’s own app, while server tools like web search, code execution, web fetch, and tool search run on Anthropic’s infrastructure. Anthropic is also tightening the format of those tool calls. Its tool-use docs now push “strict” schemas, which force Claude’s tool requests to match an exact structure instead of loosely guessing field names and arguments. Then came the new “advisor tool,” which is a cost-control trick dressed up as architecture. A cheaper model like Claude Sonnet 4.6 or Claude Haiku 4.5 does most of the work, and a stronger model, Claude Opus 4.6, only steps in mid-task to give a plan or correction. Anthropic says the advisor reads the full conversation and usually returns a short plan of about 400 to 700 text tokens before the cheaper model resumes execution. The company positions that setup for coding agents, computer-use agents, and multi-step research jobs where most turns are routine but a few decisions are expensive to get wrong. The other new building block is “Agent Skills.” Anthropic describes a Skill as a reusable folder of instructions, metadata, scripts, templates, and reference files that Claude can load when a task calls for it, instead of stuffing all that guidance into every prompt. Anthropic says those Skills run through Claude’s virtual machine environment with filesystem access. In plain English, that means Claude can treat a Skill less like a sticky note and more like a labeled toolbox sitting on a workbench, opening only the drawer it needs. That same design is where the safety tension shows up. Anthropic’s Skill docs say custom Skills can include executable code, and the company says Claude loads Skill content in stages so it does not pull every file into context at once. At the same time, Anthropic’s April 7 release notes say Claude Mythos Preview is not getting a normal public application programming interface launch at all. Instead, it is available only as an invitation-only research preview for defensive cybersecurity work under Project Glasswing. Spiceworks reported on April 10 that Anthropic pulled the broader launch after pre-deployment testing found Mythos could identify thousands of zero-day vulnerabilities, including an old flaw in OpenBSD and another in FFmpeg that earlier automated testing had missed. Anthropic’s answer was not “ship it carefully” but “gate it tightly,” which tells you how seriously it views code-executing and exploit-finding capability right now.

Anthropic: tools and safety

Get your own daily briefing