AI agent attack map

Researchers mapped how ordinary web content can be weaponised to trick AI agents into leaking information or taking unintended actions, showing six attack types that manipulate agent behaviour. In tests the researchers report exploit rates as high as 86%, highlighting how immature agent security remains and why browsing agents need provenance checks, sandboxing and audit trails. (securityweek.com) (news.bitcoin.com)

The web is becoming hostile in a new way. Not just to people, but to the AI agents now being trained to browse, read, remember, and act online for them. In a new paper, Google DeepMind researchers argue that ordinary web content can be turned into an “AI agent trap”: a page, interface element, or data source designed to manipulate an autonomous agent into leaking information, changing its plan, or taking actions its user never intended. The point is not that the model is broken in isolation. The point is that the environment around it has become part of the attack surface (securityweek.com, rivista.ai). That shift matters because the industry keeps moving toward agents with real permissions. Google DeepMind’s own Project Mariner is a browser agent prototype. Other systems are being pitched to manage inboxes, shop online, operate cloud tools, and handle workplace workflows. Once an agent can read the web and also use tools, a malicious page no longer has to fool a human reader. It only has to fool the software acting on the human’s behalf (deepmind.google, decrypt.co). The DeepMind paper breaks that problem into six attack types. Content injection traps hide instructions where people will not notice them but agents still parse them, such as HTML comments, invisible page elements, or metadata. Semantic manipulation traps bias the agent’s reasoning with framing and misleading cues that look harmless. Cognitive state traps poison memory, retrieval, or learned behavior so the damage persists after the page is gone. Behavioural control traps push the agent into unauthorized actions. Systemic traps exploit interactions among multiple agents. Human-in-the-loop traps target the human supervisor’s own biases, turning oversight into another weakness instead of a safeguard (rivista.ai, decrypt.co). Some of the results are ugly. Reporting on the paper says simple content-injection attacks succeeded in up to 86% of tested scenarios. That is not a subtle edge case. It is a sign that many agents still treat untrusted web text as if it were part of the user’s request. The same reporting describes behavioural-control tests against Microsoft 365 Copilot that achieved complete data exfiltration in a small demonstration, showing how quickly browsing risk can become enterprise risk once an agent is wired into documents, email, and internal tools (securityweek.com, decrypt.co, news.bitcoin.com). None of this appeared out of nowhere. Earlier work on browsing agents had already shown prompt injection, credential theft, and domain-validation bypasses in real systems, including a 2025 paper on the open-source Browser Use project. Palo Alto Networks’ Unit 42 also reported last week that overprivileged agents in Google Cloud’s Vertex AI could be weaponized to compromise wider cloud environments. DeepMind’s contribution is to show that these are not isolated bugs. They are pieces of a broader pattern: once agents can perceive, remember, and act, every layer can be attacked through content alone (arxiv.org, unit42.paloaltonetworks.com, securityweek.com). That is why the paper’s defenses sound less like chatbot safety and more like classic computer security. The researchers call for provenance checks so agents know where instructions came from, sandboxing so compromised agents cannot roam freely, and audit trails so humans can reconstruct what happened after the fact. OpenAI has been making a similar point in its own security writing, describing prompt injection as a long-term problem that is unlikely to be fully solved and arguing for layered mitigations rather than a single fix. The concrete detail here is the simplest one: a line of hidden text on a webpage can still redirect an agent more reliably than many of the guardrails meant to stop it (rivista.ai, openai.com, openai.com).

AI agent attack map

Get your own daily briefing