Prompt‑injection success rates

Recent public tests claim agent memory poisoning (prompt‑injection) hits over 90% success on targets including 'GPT‑5 mini' and 'Claude 4.5', and that researchers were able to defeat defenses on 153 platforms in one set of experiments. (x.com) (x.com)

Prompt injection is an attack that hides instructions inside text, web pages, or files so an artificial intelligence system treats the attacker’s words like trusted commands. OWASP says the weakness comes from system rules and untrusted input sharing the same natural-language channel. (owasp.org) A related version, memory poisoning, plants those instructions where an agent stores past work for later use. In a March 7, 2025 paper, researchers behind MINJA said they could inject malicious records into an agent’s memory bank using only normal queries and observed outputs. (arxiv.org) That matters because agents do more than answer one prompt. The MINJA paper says many agents retrieve long-term memory records as examples for future tasks, so a poisoned record can shape later behavior when a different user asks a related question. (arxiv.org) Academic results before this year already showed high success rates under lab conditions. A NeurIPS 2024 paper, AgentPoison, reported average attack success rates of at least 80 percent against three kinds of agents while changing benign-task performance by 1 percent or less with a poison rate below 0.1 percent. (proceedings.neurips.cc) A newer April 2026 paper pushed the idea closer to how consumer agents browse the web. The authors of eTAMP said a single manipulated page view could poison memory across sessions and sites, with attack success rates up to 32.5 percent on GPT-5 mini, 23.4 percent on GPT-5.2, and 19.5 percent on GPT-OSS-120B in WebArena and VisualWebArena tests. (arxiv.org) The paper also said agents became easier to steer when the environment was degraded. Under what the authors called “Frustration Exploitation,” attack success increased by as much as 8 times when agents faced dropped clicks or garbled text. (arxiv.org) Researchers have been measuring prompt injection more broadly for longer than the recent memory-poisoning papers. A USENIX Security 2024 study evaluated 5 attack methods and 10 defenses across 10 models and 7 tasks, arguing that the field needed common benchmarks because most earlier work was limited to case studies. (usenix.org) The risk is no longer only academic. Microsoft’s security team wrote on February 10, 2026 that it found more than 50 unique memory-poisoning prompts from 31 companies across 14 industries, often embedded in “Summarize with AI” links that tried to make assistants “remember” a brand as trusted or recommend it first. (microsoft.com) Security vendors are also building tools around the problem. Preamble released its open-source Prompt Injector toolkit in July 2025 and said the software includes 100-plus attack payloads for testing prompt-injection and jailbreak weaknesses across major model providers. (github.com) (eejournal.com) The public claims circulating this week about 90 percent-plus success rates and defenses failing on 153 platforms appear to describe security testing in that same fast-moving research stream, but the cited X posts were not readable through web fetch during reporting, so their exact methods and numbers could not be independently verified here. What is verified is the broader pattern: prompt injection and memory poisoning keep working across agents, tools, and browsing systems even as defenses improve. (github.com) (usenix.org)

Prompt‑injection success rates

Get your own daily briefing