ChatGPT Gets 'Lockdown Mode' for Security

OpenAI has introduced a "Lockdown Mode" and "Elevated Risk" labels for ChatGPT to enhance security when interacting with external systems. The features aim to mitigate risks such as prompt injection and data exfiltration. The new controls are intended to help developers prevent data leaks when integrating the AI into applications with sensitive workflows.

- "Lockdown Mode" is an optional setting primarily for high-risk users like executives or security teams, not for the average user. It is available on ChatGPT Enterprise, Edu, Healthcare, and for Teachers, with plans for a consumer release in the coming months. - When activated, Lockdown Mode restricts ChatGPT's interactions with external systems to prevent data exfiltration. For instance, web browsing is limited to cached content, meaning no live network requests leave OpenAI's controlled environment. - The mode disables several features where data safety cannot be deterministically guaranteed, including the generation of images in responses, Deep Research, and Agent Mode. While users can still upload files for analysis, the AI cannot download them on its own. - "Elevated Risk" labels are being standardized across ChatGPT, ChatGPT Atlas, and Codex to flag features that might introduce additional security risks, such as granting network access. These labels are intended to be temporary and will be removed as security mitigations improve. - Prompt injection, the primary threat these features address, occurs when malicious instructions are embedded in user inputs to trick the AI into bypassing its safety protocols. This is possible because large language models don't inherently distinguish between developer instructions and user-provided data. - Attackers can use indirect prompt injection by hiding malicious commands in external data that the LLM might process, such as websites, documents, or emails. For example, a hidden command in a PDF could instruct the AI to send the document's contents to an external server. - The core security challenge with AI agents is that they can take actions, not just provide answers, turning a successful prompt injection from a content issue into a potential security breach. This is known as the "confused deputy problem," where an attacker tricks the agent into misusing its legitimate permissions. - Best practices for securing AI agents, which developers are increasingly building, involve applying the principle of least privilege by default, strictly scoping the agent's tools and database access, and implementing deterministic input filtering to catch malicious patterns before they reach the LLM.

ChatGPT Gets 'Lockdown Mode' for Security

Get your own daily briefing