AI coding: harnesses & agents
AI coding is moving away from ad‑hoc terminal chats toward 'harnesses' that separate planning from execution—models produce structured plans, humans review, and then controlled code edits are executed under test gates. Media coverage and recent videos describe multi‑phase workflows and multi‑agent pipelines (planner, critic, executor), suggesting the next gains in developer productivity will come from orchestration and review UX, not just bigger models. (youtube.com)
A coding model used to be like a very fast intern in your terminal: you pasted a bug, it guessed a fix, and you hoped the patch did not break three other files. The new setup looks more like a small software team, where one model writes a plan, another checks it, and only then does a tool edit code and run tests. (openai.com) The word people are using for that setup is “harness.” It means the software around the model: the rules, file access, approval steps, test gates, and review screens that keep a strong model from acting like a strong model with no brakes. (anthropic.com) This shift is happening because the old “reason, act, observe” loop was expensive and clumsy for long jobs. LangChain’s planning-agent writeup says a plan-first system can be faster and cheaper because the big model does the planning once, while smaller steps run without asking the expensive brain for permission every time. (langchain.com) OpenAI put unusually concrete numbers on the new approach in February 2026. It said three engineers used Codex to build an internal beta with about 1 million lines of code and roughly 1,500 pull requests in five months, with humans steering the system instead of writing code by hand. (openai.com) The important detail in that post was not “the model wrote a lot of code.” It was that the repository had to be shaped for the agent first, with repository structure, continuous integration checks, formatting rules, and an instructions file called AGENTS.md so the model knew how work was supposed to happen. (openai.com) Anthropic describes the same pattern from the other side. Claude Code says its loop has three phases — gather context, take action, verify results — and the harness is the layer that gives the model tools to read files, edit code, run commands, and feed those results back into the next step. (anthropic.com) OpenAI’s Codex command line tool now exposes that orchestration directly. Its docs say you can run a separate Codex agent for local code review, use subagents to parallelize complex tasks, launch cloud tasks, and choose approval modes before the agent edits files or runs commands. (openai.com) Google is pushing the same idea into remote work instead of local terminal sessions. Google’s Jules is described as an asynchronous coding agent that connects to repositories, handles tasks like writing tests and fixing bugs, and works in a cloud virtual machine that can open pull requests while the developer does something else. (developers.googleblog.com) GitHub has moved Copilot in the same direction. Its product page says Copilot can validate files in agent mode, and Microsoft’s.NET team described the Copilot coding agent as a cloud tool that analyzes a repository, plans multi-step tasks, and creates issues and pull requests instead of just suggesting the next line. (github.com) (microsoft.com) So the race is no longer just “who has the smartest model.” The race is becoming “who has the best workbench”: the cleanest plan screen, the safest approval flow, the best test feedback, the clearest diff review, and the best way to split one coding job into planner, executor, and critic without wasting tokens or human attention. (openai.com) (anthropic.com) (openai.com) That is why AI coding in 2026 feels less like chatting with a bot and more like supervising a production line. The model is still the engine, but the gains are increasingly coming from the rails around it. (openai.com)