Coding agents need real context

Recent analysis argues coding agents work only when they have tool access, memory and repository context—not just a model prompt. The breakdown shows agents that can query code history, use build/test tools, and store state outperform one-shot LLM demos (Components of A Coding Agent). For platform teams that implies any assistant or automation tied to APIs must expose reference docs, schema history, examples and logs to be reliable rather than merely impressive in demos (Components of A Coding Agent).

A clear new argument about coding agents landed this month: the model by itself is rarely the thing that makes an agent useful. (vuink.com) The argument starts simple. When a developer shows a demo that types a prompt into a chat window and the code appears, the demo hides three other pieces doing the real work: tool access (build, test, repository queries), a memory layer that remembers past interactions and decisions, and repository-level context that tells the agent where the code lives and how it is built. (vuink.com) “Tools” means concrete executables the agent can call: run the test suite, grep history, open a pull request, or run the project’s build. Agents that reach into a repo and execute those actions behave very differently from an LLM that only answers in prose. Microsoft’s agent features and GitHub’s custom-agent files show teams exposing build and tool hooks to agents explicitly so the agent can act inside the repository. (devblogs.microsoft.com) Memory is not metaphorical. It is a persistent store of the agent’s past actions, saved notes about the codebase, and policy choices that survive across sessions. Agents wired to memory can avoid repeating failed experiments and can apply preferences consistently; editor vendors are shipping memory primitives so agents keep workspace‑level state rather than start from scratch each chat. (code.visualstudio.com) Repository context is the third axis, and it is subtle. Teams began adding files like AGENTS.md to describe repo structure, required tools, and test commands so agents don’t guess. But a rigorous evaluation of AGENTS.md practices found mixed effects: context files can encourage more careful exploration, yet overly prescriptive context sometimes lowers success rates and raises inference cost. The point isn’t “always more context,” it is “the right, actionable context delivered in the right way.” (arxiv.org) For platform teams building APIs, gateways, or an “AI layer” for developers, the lesson is architectural and operational. Treat agents as systems that need first‑class metadata: machine‑readable reference docs, schema/version history, canonical examples, tool manifests, and execution logs that agents can query and use. Vendors and open-source projects are already positioning tracing and evaluation tools to capture agent steps, so you can measure task success, latency, costs, and where hallucinations or permission errors happen. (gravitee.io) Practically, that changes platform work. Design your APIs not only for human ergonomics but for agent consumption: include an action registry, a test harness endpoint, example payloads, and a compact “agent README.” Add agent‑aware observability — traces that link an agent prompt to the build/test run and to the resulting PR — so you can ask “which agent action caused this regression?” and answer it with data. (github.blog) For the individual contributor choosing to be an architect, this is a new design domain: system contracts for agents, tool sandboxing, and cost/latency tradeoffs in context provisioning. For the manager, it’s a product problem: hiring a mix of docs, platform, and security talent, and measuring platform success by agent task rates, developer time saved, and incident reduction rather than just API uptime. If you run an API or platform, start small and concrete: publish an agent‑focused repo descriptor, expose a tool registry and test endpoint, and capture execution traces. Tools from the agent‑ecosystem already plug into those primitives; once you wire them together, agents stop being impressive demos and become reliable automation you can measure and productize. (devblogs.microsoft.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.