Google warns of 'AI agent traps'

Google warned that attackers can use malicious web content to trap and hijack autonomous AI agents—techniques that detect when an agent is running and then serve cloaked payloads to manipulate transactions or API calls. The issue highlights a new attack surface as firms deploy agents to act autonomously on the web. (x.com)

Google’s warning about “AI agent traps” starts with a simple shift in how software behaves online. A chatbot answers questions when you ask. An agent goes out and does things: it reads a website, clicks buttons, fills forms, calls APIs, sends email, and sometimes completes a purchase. Google DeepMind says that once software starts acting on the open web by itself, the web stops being just a source of information and becomes an attack surface built to fool it (securityweek.com, cloud.google.com). In a new paper, DeepMind researchers give that attack surface a name: “AI Agent Traps.” They describe these traps as adversarial bits of web content designed to manipulate, deceive, or exploit autonomous agents that visit them. The paper says the danger is not tied to one model or one product. It comes from a broader fact about agents: they must absorb outside information, decide what it means, and then act with whatever tools and permissions they have been given (rivista.ai). That makes the attack feel less like breaking into a system and more like laying bait in its path. A malicious page can hide instructions in text, HTML, metadata, or rendered elements that a human would ignore but an agent may treat as part of its task. Google has been warning about this family of attacks for more than a year under the label “indirect prompt injection,” where untrusted external content slips instructions into the model’s working context and steers its behavior (security.googleblog.com, owasp.org). The new paper widens that idea into a map of how agents can be bent off course. It lists six categories: traps that exploit how agents read content, traps that corrupt reasoning, traps that poison memory, traps that seize control of actions, traps that spread through systems of multiple agents, and traps that manipulate a human supervisor into approving the wrong thing. In the paper’s framing, the attacker does not need to crack the model itself. It is enough to shape the environment around it (rivista.ai, cybernews.com). One version is especially unsettling because it borrows a trick from ordinary web fraud. Researchers have shown that websites can often detect when the visitor is an AI browsing agent rather than a human, using the same kinds of fingerprints that websites already use to identify bots and automation frameworks. Once a site recognizes the visitor, it can cloak its content and serve the human a harmless page while feeding the agent a different set of instructions meant only for the machine (arxiv.org, decrypt.co). That opens the door to attacks that look mundane on the surface and dangerous in execution. A shopping agent could be nudged toward a promoted seller. A finance agent could be coaxed into making an unauthorized transfer. A support agent could be induced to leak data through a tool call or API request. OWASP’s agent security guidance now treats prompt injection, tool abuse, memory poisoning, data exfiltration, and excessive autonomy as core risks of agentic systems, because the agent can turn a bad instruction into a real action (cheatsheetseries.owasp.org). This is no longer only a lab problem. Palo Alto Networks’ Unit 42 said in March 2026 that it had observed web-based indirect prompt injection in the wild, including attempts at ad-review evasion, search manipulation, data destruction, denial of service, unauthorized transactions, and sensitive information leakage. The firm said attackers were embedding prompts in websites that would be consumed by AI systems doing summarization, browsing, or content analysis (unit42.paloaltonetworks.com). Google’s own cloud division has been preparing for the same world from another angle. In a post last October, it described an “agentic web” in which autonomous agents shop, transact, and interact with services on behalf of users, and argued that businesses will need ways to identify both the agent and the human behind it, distinguish benign automation from malicious automation, and detect when an agent has been taken over. The example was concrete: 10,000 customer agents trying to buy one item each can look, at the network level, a lot like 10,000 scalper bots trying to buy everything (cloud.google.com). That is what makes the DeepMind warning interesting. The old web security model assumed the main target was the human user or the server. The new one adds a third vulnerable party in the middle: software that can read, reason, remember, and act, but still cannot reliably tell hostile instructions from relevant context. The trap may be nothing more dramatic than a webpage that looks ordinary in a browser and says something else when the visitor is a machine.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.