AI Labs Dial Back Public Safety Pledges

Anthropic and OpenAI are reportedly dialing back the public language around their safety and responsible scaling commitments. The shift is said to be driven by pressure from government and military clients, including the Pentagon, as the AI arms race intensifies and commercial pressures mount.

Anthropic has updated its Responsible Scaling Policy, a framework for mitigating catastrophic AI risks. The company will now only delay developing more advanced AI if it believes it has a "significant lead" over competitors, a shift from a more cautious stance. This change reflects a policy environment prioritizing AI competitiveness and economic growth over safety concerns. This policy evolution occurs as the Pentagon threatens to pull contracts if Anthropic's technology isn't available for all legal military purposes, though the company states the two issues are unrelated. The Department of Defense has awarded contracts of up to $200 million each to Anthropic, OpenAI, Google, and xAI to accelerate the use of AI. OpenAI's CEO has stated they share Anthropic's "red lines" against using AI for mass surveillance or autonomous weapons. Labs use Reinforcement Learning from Human Feedback (RLHF) to align models with human values, a process that relies on high-quality, human-labeled data to train a reward model. This technique is considered crucial for teaching models complex and subjective tasks. Anthropic's Constitutional AI is an alternative approach that uses a set of principles and AI-generated feedback (RLAIF) to guide the model, aiming for greater scalability and consistency than human feedback alone. The debate between using synthetic data versus human annotation is central to training AI models. While synthetic data offers speed and scalability—generating thousands of labeled examples in hours—it can lack the nuance and accuracy for context-sensitive tasks that human labelers provide. A hybrid approach, using synthetic data for scale and human annotation for refining complex cases, is often considered the most effective solution. As models become more "agentic"—capable of autonomous, multi-step actions—new evaluation methods are required. Benchmarks like AgentBench, WebArena, and GAIA are emerging to test agent capabilities in areas like web navigation, reasoning, and tool use. These evaluations focus on task success, cost-efficiency, and reliability, moving beyond traditional text-quality metrics. The fundraising climate for AI startups has seen massive capital concentration, with foundation model companies raising tens of billions in 2025. OpenAI and Anthropic alone captured 14% of global venture investment. However, investors are now looking beyond innovation to cost control and a clear path to profitability, with the cost of training a single model like GPT-4 reportedly exceeding $78 million.

AI Labs Dial Back Public Safety Pledges

Get your own daily briefing