Claude Code adds built-in /goals evaluator to catch agents that stop early
- Anthropic added the /goal command to Claude Code by May 12, 2026, giving the coding agent a built-in evaluator that checks completion after each turn. - Claude Code docs say /goal requires version 2.1.139 or later and uses “a small fast model” after each turn to test completion. - Anthropic documents the feature in Claude Code’s goal guide, while VentureBeat reported its production use cases on May 14.
Anthropic’s Claude Code has added a built-in way to keep coding agents from stopping before a job is actually finished. Claude Code’s `/goal` command, documented by Anthropic and available in version 2.1.139 or later, lets a user set a completion condition and then keeps the agent working across turns until that condition is met. VentureBeat reported on May 14 that the feature addresses a common failure mode in production agent pipelines: the model decides it is done even when tests, compilation steps or acceptance criteria have not been satisfied. The publication described `/goal` as a built-in evaluator layer that separates task execution from the decision to stop. (code.claude.com) Anthropic’s own documentation describes the mechanism in similar terms. After each turn, the company says, “a small fast model” checks whether the stated condition holds; if it does not, Claude starts another turn instead of handing control back to the user. ### How does the new evaluator actually work inside Claude Code? Anthropic says `/goal` sets a completion condition for the current session and starts work immediately with that condition as the directive. (venturebeat.com) The goal remains active until the condition is met, at which point it clears automatically. The Claude Code docs say the check happens after every turn, not just at the end of a single tool call. (code.claude.com) That matters because Anthropic distinguishes `/goal` from “auto mode,” which can approve tool calls within one turn but does not itself start a new turn. In Anthropic’s description, `/goal` removes per-turn prompting by adding a separate evaluator, while auto mode removes per-tool prompts. VentureBeat reported that Anthropic uses Haiku by default as the evaluator model. The publication said the evaluator makes a narrow binary judgment — whether the goal has been met or not — while the main model continues doing the coding work. ### What problem is Anthropic trying to solve with `/goal`? (code.claude.com) VentureBeat said enterprises using agent pipelines have found that some failures come less from model capability than from early termination. In the publication’s example, a migration run appeared complete even though some parts had never been compiled. (venturebeat.com) Anthropic’s documentation frames `/goal` around “substantial work with a verifiable end state.” The examples it gives include migrating a module until every call site compiles and tests pass, implementing a design document until acceptance criteria hold, splitting a large file until each module fits a size budget, or working through an issue backlog until the queue is empty. (venturebeat.com) A January Anthropic engineering post on agent evaluations said that, when evaluating an agent, the company evaluates “the harness and the model working together.” That framing helps explain why Anthropic is adding evaluator logic into Claude Code itself rather than treating completion as only a model-level judgment. ### How is this different from hooks and other agent controls? (code.claude.com) Anthropic’s docs say `/goal`, `/loop` and Stop hooks all keep a session running between prompts, but they trigger differently. `/goal` starts a new turn when the previous turn finishes and stops only when a model confirms the condition is met; `/loop` starts another turn on a time interval; Stop hooks let a user’s own script or prompt decide whether to continue. (anthropic.com) The company describes `/goal` as session-scoped, while a Stop hook lives in settings and can apply more broadly. Anthropic also says the two can be used together, because both fire after every turn. VentureBeat contrasted Anthropic’s approach with other agent frameworks that allow independent evaluation but often require developers to define critic nodes, termination logic or observability plumbing themselves. (code.claude.com) ### When did the feature land, and where is it documented? Anthropic’s documentation says `/goal` requires Claude Code version 2.1.139 or later. (code.claude.com) The company’s docs also say users can check their installed version with `claude --version`. Anthropic’s public materials show version 2.1.139 in the Claude Code release stream and document `/goal` on the product docs site. (venturebeat.com) VentureBeat’s story on May 14 added outside reporting on how developers are using the feature to catch agents that stop early in production-style workflows. Anthropic’s next public updates on the feature are likely to appear in the Claude Code changelog and docs pages, which the company updates alongside new releases such as the current 2.1.141 version listed on GitHub. (code.claude.com) (github.com 1) (github.com 2)