Anthropic: Constitutional 2.0

- Anthropic rolled out a Constitutional 2.0 upgrade that adds dynamic safety rules to model behavior. - The update is described as making safety constraints adaptive rather than fixed across interactions. - Company posts framed this as part of its enterprise product evolution and internal governance conversations. (x.com)

Anthropic has rolled out a “Constitutional 2.0” update that shifts Claude’s safety behavior toward rules that can change with context instead of staying fixed. (anthropic.com) Anthropic published Claude’s new constitution on January 22, 2026 and said the document is a “foundational” part of training, shaping how the model balances helpfulness, safety, honesty, privacy, and compliance. (anthropic.com) The company also said the constitution now plays a more central role in training than it did in earlier Claude systems, including generating synthetic training data and ranking possible responses during model development. (anthropic.com) A constitution in this context is a written set of behavioral principles for the model, not a filter bolted on after the fact. Anthropic said it is written primarily for Claude itself, so the model can use it when handling tradeoffs such as honesty versus compassion or openness versus protection of sensitive information. (anthropic.com) Anthropic has been using Constitutional AI since 2023, but its February 24, 2026 update to the Responsible Scaling Policy described a broader move toward “conditional” safeguards that tighten when model capabilities or risks change. (anthropic.com) That same policy update said large language models have moved from basic chat interfaces to systems that can browse the web, run code, use computers, and take autonomous multi-step actions, which is the backdrop for more adaptive controls. (anthropic.com) Anthropic has already tied stronger safeguards to stronger models. On May 22, 2025, it said Claude Opus 4 launched with Artificial Intelligence Safety Level 3 protections, including tighter security around model weights and narrower deployment controls aimed at chemical, biological, radiological, and nuclear misuse. (anthropic.com) The company has also been building more automation into alignment work. A March 11, 2026 research paper on Anthropic’s A3 system said the agent automatically generates safety data, fine-tunes models, and adapts its strategy to reduce failures such as sycophancy, political bias, and jailbreaks. (alignment.anthropic.com) Anthropic is presenting that safety work alongside a larger enterprise push. Its Transparency Hub says Claude Sonnet 4.6 is available through Claude.ai, the Anthropic application programming interface, Amazon Bedrock, Google Vertex AI, and Microsoft Azure AI Foundry, and Anthropic’s enterprise materials emphasize audit logs, retention controls, and governance features. (anthropic.com 1) (anthropic.com 2) Anthropic said the constitution is a “living document” and released it under a Creative Commons CC0 license, which lets outsiders inspect, reuse, and critique the rules that are supposed to guide Claude’s behavior as those rules keep changing. (anthropic.com)

Anthropic: Constitutional 2.0

Get your own daily briefing