Anthropic Drops Safety Pledge Amid Pentagon Pressure

Anthropic has overhauled its Responsible Scaling Policy (RSP), dropping its core public safety pledge and separating its internal commitments from industry-wide recommendations. The change reportedly follows pressure from the Pentagon for more flexible AI guardrails. Executives cited government and military demands as a key reason for the shift in its approach to AI risk management.

The original Responsible Scaling Policy (RSP) was built on "if-then" commitments tied to AI Safety Levels (ASLs); for instance, crossing a capability threshold for biological misuse would trigger stricter safeguards. Anthropic credits this internal framework for advancing its classifier development to reduce chemical and biological risks, activating ASL-3 safeguards in May 2025. However, the revised RSP 3.0 now separates Anthropic's direct company commitments from broader, more ambitious recommendations for the entire AI industry. The dispute with the Pentagon centers on the military's desire to use AI for "all lawful purposes," including domestic surveillance and autonomous weapons, which Anthropic has resisted. The Pentagon set a deadline for the company to remove these guardrails, threatening to either blacklist Anthropic as a "supply chain risk" or compel compliance via the Defense Production Act. This high-stakes negotiation involves a $200 million contract for Anthropic to deploy its Claude model on the Pentagon's classified networks through a partnership with Palantir. Anthropic's original alignment technique, Constitutional AI, trains models to critique and revise their own outputs based on a set of principles, reducing the need for direct human labeling of harmful content. This contrasts with Reinforcement Learning from Human Feedback (RLHF), used heavily by OpenAI, which trains models on vast datasets where humans have ranked different model responses for helpfulness and harmlessness. While RLHF excels at capturing nuanced human preferences, it can be labor-intensive and expensive. For data labeling businesses, this signals a complex market. While synthetic data can be generated up to 50 times faster and avoids privacy issues, it falls short in accuracy for context-sensitive tasks by as much as 35%. Human labeling remains critical for nuance, domain expertise, and bias mitigation, with hybrid models that combine synthetic data's scale with human verification often demonstrating the best performance. Poor data quality is a primary cause of ML project failure, creating bottlenecks as data science teams are forced to clean and reconcile data instead of building models. Evaluating the next generation of agentic AI systems requires new benchmarks beyond traditional LLM tests. Frameworks like AgentBench, WebArena, and GAIA are emerging to test agents on multi-step, open-ended tasks in realistic environments like operating systems and web browsers. These benchmarks assess complex reasoning and error identification in long chains of actions, a critical area for new data labeling and evaluation services. For AI infrastructure startups, the fundraising climate is robust but concentrating around a few key players. In 2025, infrastructure fundraising more than doubled the previous year's total, driven by massive demand for AI-powering data centers. Venture capital investment in AI startups reached approximately $131.5 billion in the last cycle, capturing a third of all VC dollars, with Series A rounds for AI companies commanding median valuations over $50 million. A successful go-to-market strategy for selling to AI labs involves embedding technical experts like presales engineers early in the sales cycle to build trust with highly educated buying committees. AI can be leveraged to define ideal customer profiles and tailor messaging, but it cannot fix underlying gaps in a company's revenue process. Success requires tying AI initiatives directly to deal movement and revenue, not just activity metrics.

Anthropic Drops Safety Pledge Amid Pentagon Pressure

Get your own daily briefing