Researchers show tiny edits to agent 'skills' let AI agents ignore safeguards
- University of Maryland researchers said on May 12 that small text-only edits to AI agent skill files can alter which tools agents find and use. - The paper reported adversarial skills won 86% of retrieval matchups, were chosen in 77.6% of paired trials, and evaded blocking in up to 100%. - The preprint and code are on arXiv and GitHub, with Soheil Feizi and co-authors detailing the attack stages.
University of Maryland researchers said in a May 12 preprint that tiny edits to the natural-language files describing AI agent “skills” can change how agents discover, choose and run third-party capabilities. The work focuses on SKILL.md files — text documents that tell an agent when a skill is relevant and how it should be used — rather than changes to executable code. Thomas Claburn of The Register reported the findings on May 22, citing the paper and comments from Soheil Feizi, a computer science professor at the University of Maryland and founder of RELAI.ai. ### What exactly did the researchers change? Shoumik Saha, Kazem Faghih and Soheil Feizi wrote that their attacks modified only SKILL.md content while leaving a skill’s functional structure largely intact. The paper describes three registry-facing stages where those edits matter: discovery, selection and governance. The researchers said short textual triggers could manipulate embedding-based retrieval so an adversarial skill appeared more relevant to a target query. (arxiv.org) They also said description-only framing could bias an agent toward choosing a malicious or adversarial variant over a functionally equivalent benign one, and that semantic evasion could help malicious skills avoid moderation or blocking. ### How large were the effects in testing? The University of Maryland paper reported up to an 86% pairwise win rate and 80% top-10 placement for adversarial skills in discovery tests. In selection tests, the authors said adversarial variants were chosen in 77.6% of paired trials on average. In governance tests, they said semantic evasion avoided a blocking verdict in 36.5% to 100% of cases. (arxiv.org) The Register reported those shifts were large enough in experiments to make agent behavior appear “rogue” even when the code behind the skill had not materially changed. The paper’s central claim is that SKILL.md is “not passive documentation but operational text” that affects which capabilities agents find, trust and use. (arxiv.org) ### Why do skill files matter so much to agent systems? Soheil Feizi said many agent frameworks let users install skills from online registries so agents can discover and use new capabilities on demand. He said that creates “a new attack surface” because skills are not just code dependencies but also text instructions that shape model behavior. (theregister.com) The Register said those files can be added to the initiating prompt and existing system prompts before the model responds. That means a skill file can function as a user-authorized form of prompt injection, especially if an agent automatically retrieves and loads third-party skills whose descriptions appear relevant. (theregister.com) ### Is this only a theoretical concern? The Register said the risk has already been documented in the broader skill ecosystem. It cited Snyk’s February finding that 13.4% of audited skills on ClawHub and skills.sh — about 534 out of 3,984 — contained at least one critical-level security issue, including malware distribution, prompt injection attacks and exposed secrets. (theregister.com) A separate arXiv paper submitted on February 27 described agent-skill ecosystems as a supply-chain attack surface and said a large-scale empirical analysis of 42,447 agent skills found vulnerabilities in 26.1% of them. That paper also described the January-February 2026 ClawHavoc campaign, which it said infiltrated more than 1,200 malicious skills into the OpenClaw marketplace. (theregister.com) ### What does this change for companies deploying agents? The GitHub repository accompanying the Maryland paper says the core issue is that SKILL.md is “operational text, not passive documentation,” so small language changes can affect which skills are surfaced, selected and accepted. That makes skill registries look less like static documentation stores and more like mutable control planes for agent behavior. (arxiv.org) For operators, the practical implication is that controls cannot stop at code review. The attack path described in the paper points to the need for named change authority over skill files, limits on which agents can load new skills, audit logs for registry edits, and rollback procedures when a text update changes behavior. Those steps are an inference from the paper’s attack stages and the registry mechanisms it tested. (github.com) The preprint “Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry” was posted to arXiv on May 12, and the authors linked implementation code in a public GitHub repository. The authors listed in the paper are Shoumik Saha, Kazem Faghih and Soheil Feizi. (arxiv.org 1) (arxiv.org 2)