Opus 4.7 trades long context for agents

- Anthropic’s Claude Opus 4.7 launched on April 16 as a same-price upgrade to Opus 4.6, aimed at harder coding jobs and longer autonomous runs. - Anthropic says Opus 4.7 clears 70% on CursorBench versus 58% for Opus 4.6, and adds self-verification before reporting results. - Opus 4.6 still leads Anthropic’s own long-context retrieval tests, including MRCR v2’s 8-needle benchmark. (anthropic.com)

Anthropic’s Claude Opus 4.7 is a coding-first upgrade that improves autonomous software work, even as Opus 4.6 remains the company’s standout long-context model. (anthropic.com 1) (anthropic.com 2) Anthropic released Opus 4.7 on April 16, 2026, and kept pricing unchanged at $5 per million input tokens and $25 per million output tokens. The model is available in Claude, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. (anthropic.com) The company’s pitch is specific: Opus 4.7 handles “complex, long-running tasks,” follows instructions more precisely, and “devises ways to verify its own outputs before reporting back.” Anthropic also says developers can hand off harder coding work with less supervision than before. (anthropic.com) In plain terms, long context is the model’s ability to keep track of useful details across a huge amount of text, code, or documents without losing the thread. Agentic work is different: the model plans steps, uses tools, checks its own work, and keeps going across a longer task. (platform.claude.com 1) (platform.claude.com 2) Anthropic is leaning harder into the second category with Opus 4.7. Its documentation calls the model “highly autonomous,” adds a new “xhigh” effort setting for harder reasoning, and introduces task budgets so developers can cap the token spend for a full agent loop. (platform.claude.com) The headline benchmark gains are in coding. Anthropic says Opus 4.7 reaches 70% on CursorBench, up from 58% for Opus 4.6, and markets the release as stronger on “advanced software engineering” and “long-horizon agentic work.” (anthropic.com 1) (anthropic.com 2) That does not mean Opus 4.7 replaced Opus 4.6 on every dimension. When Anthropic launched Opus 4.6 on February 5, it highlighted a 1 million-token context window and said the model scored 76% on the 8-needle, 1 million-token variant of MRCR v2, a benchmark for retrieving multiple facts buried in massive text. (anthropic.com) Anthropic’s Opus 4.6 system card went further and described the model as state-of-the-art on long-context comprehension and precise sequential reasoning. The company’s newer Opus 4.7 launch materials emphasize autonomy, coding, vision, and memory tasks, but they do not make the same long-context leadership claim. (anthropic.com) (anthropic.com) That leaves developers with a more practical choice than a simple version upgrade. Teams building coding agents may prefer Opus 4.7’s verification and orchestration features, while teams that depend on exact retrieval across giant documents may keep Opus 4.6 in production. (anthropic.com) (anthropic.com) Anthropic’s own docs hint at that split in another way. Opus 4.7 still supports a 1 million-token context window, but the company’s context engineering guidance says long-context performance depends on what stays in context and pushes developers toward compaction, memory, and tool-clearing strategies for long-running systems. (platform.claude.com) (platform.claude.com) (platform.claude.com) The result is not a clean handoff from one Opus release to the next. Opus 4.7 looks like Anthropic’s preferred model for autonomous coding agents, while Opus 4.6 still carries the clearest public case for long-document retrieval at scale. (anthropic.com) (anthropic.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.