Skills = Attack Surface

Anthropic published an 'Agent Skills' framework and an advisor tool but explicitly warns that Skills must come from trusted sources because a malicious Skill can invoke tools or take harmful actions. That warning shifts attention to provenance, scoped permissions and staged autonomy for any marketplace distributing third‑party agent capabilities (testingcatalog.com) (platform.claude.com).

Anthropic’s new “Skill” system gives an artificial intelligence agent extra abilities the way a phone app gives your phone extra features: you drop in a folder with instructions, scripts, and resources, and Claude can call it when a task matches. Anthropic says those Skills can be pre-built by Anthropic or custom-built by developers. (platform.claude.com) The surprise is in Anthropic’s own warning. Its documentation says developers should use Skills only from trusted sources because a malicious Skill can push Claude to invoke tools or execute code in ways that do not match the Skill’s stated purpose. (platform.claude.com) That turns a Skill from “just a prompt” into something closer to a browser extension. A bad extension can read pages and click buttons you did not expect, and Anthropic says a bad Skill can similarly steer tool use or code execution behind a friendly description. (platform.claude.com) Anthropic’s docs split Skill content into three loading levels, and that detail matters. Metadata is always loaded, instructions are loaded when Claude selects the Skill, and resources such as scripts or files are pulled in only when needed. (platform.claude.com) That layered design saves context space, but it also means the dangerous part may not be visible at first glance. A marketplace could show a clean name and description while the risky behavior sits deeper in the instructions or supporting code that gets loaded later. (platform.claude.com) Anthropic released this alongside an “advisor tool” for the Claude platform. The advisor lets a cheaper executor model pause mid-task, send the full conversation to a stronger model for a plan, and then continue the job with that guidance. (platform.claude.com) Anthropic says the advisor pattern is aimed at long-horizon work such as coding agents, computer use, and multi-step research pipelines. In Anthropic’s setup, the advisor typically returns a 400 to 700 token plan, while total advisor usage lands around 1,400 to 1,800 tokens including internal reasoning. (platform.claude.com) TestingCatalog reported on April 9, 2026 that Anthropic had opened the advisor tool to Claude Platform API users and described the pairing as Opus advising Sonnet or Haiku. That means the same system that can load third-party capabilities can also get higher-level strategic guidance in the middle of acting. (testingcatalog.com) Put those two releases together and the security question changes shape. The risk is no longer only “what tools did I give the agent,” but also “who wrote the Skill that tells the agent when and how to use those tools.” (platform.claude.com) That is why provenance starts to matter like software signing matters for mobile apps. If a company eventually hosts a Skill marketplace, it will need to show who authored a Skill, what version is installed, what tools it can touch, and whether the code was reviewed before distribution. (platform.claude.com) Permissions also stop being a background setting and become the main control surface. A document-formatting Skill that can only read uploaded files is one thing, but the same Skill wired to shell access, web browsing, or payment tools becomes a very different object. (platform.claude.com) The practical rollout path is staged autonomy, not full freedom on day one. Anthropic’s own warning points toward a safer pattern where new Skills start in read-only or approval-required mode, and only move to broader tool access after they prove they behave like their label says they do. (platform.claude.com)

Skills = Attack Surface

Get your own daily briefing