Karpathy cuts Claude error rates

- Forrest Chang’s viral `andrej-karpathy-skills` repo turned Andrej Karpathy’s complaints about Claude Code into four standing rules developers can drop into `CLAUDE.md`. - The four rules target silent assumptions, overbuilt code, collateral edits, and vague success criteria — and later testers claimed 41% error rates fell to 11%. - The bigger shift is from “better prompting” to repo-level behavior contracts that persist across sessions and tools.

Claude Code prompting has been drifting toward something more like software configuration. That’s the real story here. A short `CLAUDE.md` file — popularized by Forrest Chang from Andrej Karpathy’s observations — is being used as a standing rulebook for how Claude should behave while coding, not just what task it should do. The pitch is simple: fewer silent guesses, less bloated code, fewer stray edits, more verifiable work. The repo built around that idea has exploded on GitHub, and the claimed error-rate drop is why people care. ### What is `CLAUDE.md` actually doing? Basically, it’s a repo-local instruction file that Claude Code reads as part of the project context. Instead of re-explaining your preferences every session, you keep durable rules in the codebase itself. That matters because coding failures are often behavioral, not factual — the model guesses when it should ask, wanders into refactors nobody requested, or declares success without proving anything. (github.com) ### Where did these four rules come from? The four-rule version came from Forrest Chang’s GitHub repo, which says it was derived from Karpathy’s observations about LLM coding pitfalls. The repo frames the original problems in plain language: models make wrong assumptions and keep going, overcomplicate code and APIs, and sometimes damage adjacent code they didn’t understand well enough to touch. Chang then turned those complaints into an installable behavioral template. (github.com) ### What are the four rules? They’re straightforward. First, think before coding — state assumptions, surface ambiguity, and ask when confused. Second, simplicity first — write the minimum code that solves the asked problem. Third, surgical changes — only touch what the request requires, and don’t “clean up” unrelated code. Fourth, goal-driven execution — define success criteria in a testable way and verify them. Those aren’t style tips. They’re guardrails matched to specific failure modes. (github.com) ### Why would this help more than a better prompt? Because prompts are ephemeral, but behavior contracts persist. A normal chat prompt can steer one task. A `CLAUDE.md` file shapes every task in that repo. Turns out that’s a better fit for recurring engineering mistakes, which are usually systematic. If Claude tends to over-abstract or silently assume API shapes, the fix is not “be smarter” — it’s “never do that without surfacing it.” (github.com) ### What about the 41% to 11% number? That metric does not appear in the core GitHub repo itself. It shows up in later community writeups and derivative rule sets, especially around an expanded 12-rule version tested across 30-plus codebases. Those posts credit the original four rules for a big chunk of the improvement, then claim the added eight rules pushed error rates lower still — in some tellings, down to 3% on narrower tasks. So the headline number is best read as downstream practitioner evidence, not an official benchmark from Karpathy, Anthropic, or the repo maintainer. (github.com) ### Why did this spread so fast? Because it’s tiny, legible, and easy to test. The repo describes itself as a single `CLAUDE.md` file, and GitHub shows it at roughly 126,000 stars now. That’s unusually large for a text-file workflow tweak. But the virality makes sense — developers can paste it into a real project in minutes and see whether diffs get cleaner. ### Is this just for Claude? Not really. (github.com) The file is branded around Claude Code, but the underlying idea generalizes to Cursor, Copilot CLI, and other coding agents. The useful abstraction is not “Claude likes markdown files.” It’s that coding agents need stable operating rules, and those rules should be attached to the repo, not trapped in one engineer’s memory. That’s why this feels bigger than one viral post. (github.com) ### What’s the catch? The catch is that rules compete for context. Some community writeups argue that once these files get too long, compliance drops and the strongest instructions get diluted. So the lesson is not “stuff every preference into `CLAUDE.md`.” It’s the opposite — keep only rules that block known failure modes. Think of it less like a wishlist and more like a circuit breaker panel. The bottom line is that Karpathy’s influence here wasn’t a magic prompt. (github.com) It was a diagnosis. Chang’s repo turned that diagnosis into a reusable file, and the community layered on its own measurements and extensions after that. If the reported gains hold up in more teams, the important shift won’t be Claude getting mysteriously better — it’ll be developers getting much more explicit about how they want coding agents to behave. (zone.ci)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.