Anthropic's safety push

- Anthropic released Claude Opus 4.7 alongside a 'Constitutional 2.0' update focused on agent self‑correction. - The releases aim to improve dynamic safety and allow agents to iteratively correct their outputs at runtime. - Some coverage also flagged concerns about Claude Desktop browser extensions and potential privacy implications in early reports. ( )

Anthropic is pairing a new Claude model with a rewritten rulebook that tells its AI agents to catch and correct more of their own mistakes while they work. (anthropic.com) Anthropic said on April 16 that Claude Opus 4.7 is now generally available across Claude, its application programming interface, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The company said the model improves on Opus 4.6 in advanced software engineering and is better at “long-running tasks” and checking its own work before replying. (anthropic.com) To understand the change, start with the “constitution.” Anthropic uses that document as a written set of values and instructions for Claude, and the company said the January 22 version is now a more central part of training and is written primarily for the model itself. (anthropic.com) Anthropic said the constitution helps Claude handle tradeoffs such as honesty, compassion, and protection of sensitive information, and that Claude also uses it to generate synthetic training data and rank possible responses. Anthropic released the full text under a Creative Commons CC0 1.0 dedication. (anthropic.com) The push comes as AI companies try to turn chatbots into agents that browse websites, fill forms, read documents, and take actions on a user’s behalf. Anthropic said in a November 24 research post that every webpage an agent visits can carry hidden instructions, making prompt injection one of the biggest security problems for browser-based agents. (anthropic.com) Anthropic has been expanding the software around those agents at the same time. In June 2025, it introduced Claude Desktop Extensions, packaged tools that let Claude Desktop connect to local applications, files, databases, and other private data on a user’s machine through the Model Context Protocol. (anthropic.com) That broader access has drawn scrutiny this week. The Register reported on April 20 that privacy consultant Alexander Hanff found a Native Messaging manifest on macOS that he said Claude Desktop installed without disclosure, pre-authorizing certain Chromium browser extension IDs, including for browsers not yet installed on the device. (theregister.com) Hanff said the setup could let a browser extension talk to a local executable outside the browser sandbox with the user’s privileges. Anthropic’s own browser-use research says browser agents face a large attack surface because they can navigate, click, fill forms, and download files after processing untrusted web content. (theregister.com, anthropic.com) Anthropic has also been emphasizing formal safety disclosures. Its Transparency Hub says model reports summarize capabilities, evaluations, and deployment safeguards, and the company says recent Claude models are trained with techniques that include traits highlighted in Claude’s Constitution. (anthropic.com) The immediate test is whether Anthropic can make self-correcting agents more useful without widening the privacy and security risks that come with giving those agents more ways to act. (anthropic.com, anthropic.com)

Anthropic's safety push

Get your own daily briefing