Prompt injection and tool-chain risks

Prompt injection remains a live threat: direct attacks manipulating user inputs and indirect pathway leaks (OpenClaw examples) have been documented, underlining the need for input validation and defensive orchestration documented and reported. Practitioners are embedding security reviews and validators into the planner/runtime boundary to stop malicious payloads from reaching tools and backends covered.

OpenClaw’s CVE-style GitHub advisory shows versions <=2026.2.12 logged unsanitized WebSocket headers and were patched in 2026.2.13 with header sanitization and truncation (github.com). Security writeups reproduced attacker-controlled header injection and reported the ability to push roughly ~15 KB of attacker text into logs in testing, turning diagnostics into indirect prompt-injection vectors (penligent.ai). MITRE’s ATLAS investigation mapped multiple OpenClaw attack chains and documented incidents on January 25–26, 2026 that exposed internet-facing control interfaces, harvested credentials, and demonstrated a poisoned “skill” supply‑chain proof‑of‑concept on ClawdHub (mitre.org). China’s CNCERT issued warnings about insecure defaults and data‑exfiltration risks in OpenClaw deployments, prompting operational restrictions in some government environments (thehackernews.com). Academic and industry defenses now focus on runtime enforcement rather than post‑hoc filtering: ByteDance’s AgentArmor converts agent runtime traces into program graphs and reported a 95.75% true‑positive rate with 3.66% false positives on the AgentDojo benchmark for detecting prompt‑injection patterns (arxiv.org). Open‑source projects like Sentinel‑Runtime implement declarative policy gating, dynamic risk scoring, manual approval pauses, and structured JSONL audit logs to prevent risky tool invocations at execution time (github.com). Microsoft’s Defender research describes webhook‑based runtime checks inside Copilot Studio that inspect each proposed tool action and block or allow execution in real time, surfacing those decisions to security teams for oversight (microsoft.com). Platform vendors and tooling providers are also publishing hardened runtime guidance—Docker’s runtime security post (Sep 10, 2025) recommends hardened images, runtime policies, and treating agent-executed artifacts as untrusted input when designing agent workflows (docker.com). Operational hardening guidance converges on concrete controls: treat logs as untrusted input and sanitize or truncate them before any LLM consumption, apply reverse‑proxy header size limits, and upgrade affected agent frameworks (OpenClaw → 2026.2.13+) as immediate mitigations per the vendor advisory (github.com). OWASP’s LLM Prompt Injection cheat sheet prescribes separation of system prompts from user data, canonicalization of external content, and runtime validators as standard engineering patterns for production GenAI deployments (cheatsheetseries.owasp.org). Enterprise-grade GenAI platform patterns emerging from these incidents include a declarative security authority (policy.yaml), a runtime risk engine that weights tools by sensitivity, and human‑in‑the‑loop gates for high‑risk actions—features explicitly demonstrated in Sentinel‑Runtime and advocated by Microsoft’s runtime defense guidance—paired with immutable telemetry for ASOC workflows and forensic analysis (JSONL audit trails, real‑time dashboards) (github.com).

Prompt injection and tool-chain risks

Get your own daily briefing