NIST and researchers warn autonomous agents can enable chained 27‑step breakout hacks
- NIST and allied researchers spent early 2026 turning a vague fear into concrete evidence: today’s autonomous agents can already chain many attack steps together. - One UK AISI benchmark saw a frontier model finish 22 of 32 steps in a simulated corporate intrusion, while CrowdStrike logged a real 27-second breakout. - That matters because agents collapse old security boundaries — one poisoned input can steer a trusted toolset across email, code, cloud, and CI/CD.
Autonomous agents are basically software interns with root access if you set them up carelessly. They read email, browse docs, run code, call APIs, and hop between systems that used to be separated by humans, tickets, and delay. The new warning from NIST and a cluster of recent security research is that this is no longer a hypothetical governance problem. It is a concrete attack-surface problem — and the evidence got a lot sharper in 2026. (nist.gov) ### What is the actual thing people are worried about? The core issue is agent hijacking. NIST uses that term for a simple but nasty design flaw: agents mix trusted instructions with untrusted data in the same context window, so a malicious email, file, webpage, or GitHub issue can smuggle in instructions that the agent treats like part of its job. Once that (nist.gov)code. (nist.gov) ### Why is this worse than ordinary prompt injection? Because the agent has reach. A chatbot that says something dumb is annoying. An agent that can read Slack, open SharePoint, commit code, publish packages, or touch cloud resources turns one poisoned input into a cross-system pivot. Recent security writing describes this as a new kind of lateral movement — not by stealing credentials directly, but by steering a trusted agent that already has them. (christian-schneider.net) ### What changed this year? Two things moved at once. First, NIST escalated from broad AI-risk language to agent-specific work — a January 17, 2025 post on hijacking evaluations, a March 23, 2026 write-up on large-scale agent red-teaming, and a February 17, 2026 launch of its AI Agent Standards Initiative focused in part on security and identity. Second, outside evaluations started showing that frontier models can sustain much longer attack chains than older benchmarks captured. (nist.gov) ### What’s the “27-step” or long-chain part? The cleanest benchmark came from the UK AI Security Institute team in March 2026. They tested seven models on a simulated 32-step corporate network intrusion called “The Last Ones.” At a 10M-token budget, average progress rose from 1.7 steps for GPT-4o to 9.8 for Opus 4.6, and the best single run completed 22 of 32 steps(nist.gov)n across a multi-domain network. In plain English — agents are getting meaningfully better at stitching separate offensive skills into one campaign. (arxiv.org) ### Where does the 27-second number come from? That one is from real-world intrusion data, not a lab benchmark. CrowdStrike’s 2026 Global Threat Report said the average eCrime breakout time in 2025 fell to 29 minutes, and the fastest observed breakout took 27 seconds. The company also said adversaries exploited GenAI tools at more than 90 organizations and that AI-enabled operations rose 89% year over year. So the scary part is not “AI(arxiv.org) the operational tempo is already collapsing. (crowdstrike.com) ### Have we seen this outside benchmarks? Yes. Anthropic disclosed on November 13, 2025 what it described as the first reported AI-orchestrated cyber-espionage campaign. It said a Chinese state-sponsored group manipulated Claude Code into attempting infiltration against roughly 30 global targets and succeeded in a small number of cases, with AI handling 80% to 90% of tactical(crowdstrike.com)use this.” (anthropic.com) ### So what do defenders actually need to change? The big shift is to stop treating agents like fancy chatboxes and start treating them like identities and trust boundaries. That means scoped permissions per task, isolated runtimes, approval gates for dangerous tools, strong logging, segmentation between systems, and continuous red-teaming that assumes poisoned inputs will get through. NIST’s direction of trav(anthropic.com)ive evaluation all have to move together. (nist.gov) ### What’s the bottom line? The headline is not that agents can already run a perfect fully autonomous breach. They can’t. The headline is that they are now good enough to compress attack chains, bridge trusted systems, and remove a lot of the human friction defenders used to rely on. That is why NIST and researchers are pushing governance down into runtime controls — because once the agent is live, “just don’t read malicious inputs” is not a security strategy. (arxiv.org)