New Attack Vector Uses CSS for Prompt Injection
Security researchers have demonstrated a new prompt injection attack vector named 'GhostCSS' that uses standard, hidden CSS patterns to manipulate AI browser agents. The technique leverages accessibility classes like `.sr-only`, common in frameworks like Bootstrap and Tailwind, to deliver malicious prompts that are invisible to human users but parsed by LLMs. Seven working proof-of-concept attacks were released, targeting agents that scrape or summarize web content.
- The "GhostCSS" research was conducted by the security firm Bountyy Oy to demonstrate that CSS is a viable and often overlooked vector for prompt injection attacks against AI browser agents. The core issue is the divergence between what a human sees rendered by CSS and what the AI agent parses from the Document Object Model (DOM), creating a blind spot for security. - This attack vector is particularly insidious because it leverages legitimate accessibility features, such as the `.sr-only` class in frameworks like Bootstrap and Tailwind, which are designed to provide context to screen readers but can be repurposed to hide malicious prompts from sighted users. - The vulnerability is not theoretical; security researchers have demonstrated that AI browsers like Perplexity's Comet and OpenAI's Atlas are susceptible to various forms of indirect prompt injection, where hidden instructions on a webpage can cause the agent to perform unintended actions. In one instance, researchers showed how a malicious prompt hidden in a Google Doc could manipulate browser settings. - For enterprise search systems that use Retrieval-Augmented Generation (RAG), this type of vulnerability is a significant threat. If a RAG system ingests and processes a document from a corporate knowledge base containing a hidden malicious prompt, it could be tricked into leaking sensitive data or executing unauthorized API calls. - In response to the growing threat of prompt injection, major players in the enterprise search market are actively developing and advertising their defense mechanisms. Glean, for instance, reports a 97.8% accuracy rate in detecting direct prompt injection and 90% for indirect attacks using a multi-layered AI security approach. Cohere has also partnered with security firms to establish new LLM security standards and has implemented its own classifiers and prompt injection guard filters. - The fundamental challenge, acknowledged by OpenAI's Chief Information Security Officer, is that prompt injection remains an unsolved security problem at the frontier of AI research. This is because LLMs do not inherently distinguish between trusted developer instructions and untrusted data from external sources when both are provided in natural language. - Mitigation strategies for ML engineers building RAG systems include implementing strict input validation and sanitization, using clear delimiters to separate user input from system prompts, and employing AI-based security models to detect and flag potential injection attempts in both user queries and retrieved documents. - The UK's National Cyber Security Centre (NCSC) has warned that prompt injection attacks are a fundamental issue that will likely never be entirely eliminated. They advise developers to focus on mitigating the risks and consequences rather than attempting to find a perfect prevention method, treating it as a persistent threat class similar to social engineering.