Agents shift from demo to systems
- Cloudflare engineer Matt Carey used an April 25 talk to argue AI agents should be built like software systems, not prompt demos, with APIs exposed on demand instead of hand-picking tools. - Carey said Cloudflare’s OpenAPI spec is more than 2.3 million tokens, while its new MCP server exposes the whole API with two tools — search and execute — in about 1,000 tokens. - The shift tracks a wider agent-building focus on reliability, tracing and runtime control rather than prompt tweaks alone. (blog.cloudflare.com)
AI agent builders are starting to talk less about prompts and more about systems: permissions, retries, tracing, and tool design. (blog.cloudflare.com) (vellum.ai) At Cloudflare, Matt Carey made that case directly in a talk published April 25, saying the company’s REST OpenAPI spec is more than 2.3 million tokens and too large to expose as thousands of separate agent tools. (youtube.com) (blog.cloudflare.com) Cloudflare’s answer was an MCP server that exposes the whole API with two tools, `search` and `execute`, using what it calls Code Mode. The company said that keeps the context footprint near 1,000 tokens and cuts input-token use by 99.9% versus a native MCP server. (blog.cloudflare.com) Model Context Protocol, or MCP, is the format many agent systems use to let a model call outside software. The tradeoff is simple: more tools give an agent more reach, but every tool description also eats into the model’s context window. (blog.cloudflare.com) Carey’s argument is that this is not mainly a prompt-writing problem. In the talk description, he said teams first cherry-picked “important endpoints,” shipped separate MCP services, and covered only a small fraction of the API because context limits forced those choices. (youtube.com) Cloudflare’s blog pushes the same point in engineering terms. Instead of listing every operation as a tool, the model writes code against a typed software development kit and runs it in a Dynamic Worker Loader, so the full API schema never has to sit in the prompt. (blog.cloudflare.com) That changes what product teams have to design. If an agent can search an API, compose calls, and execute actions, then schema quality, authentication, rate limits, and safe execution paths become part of the user-facing product, not just backend plumbing. (blog.cloudflare.com) The same shift showed up at the AI Engineer World’s Fair in 2025, where one recap said the biggest themes were agent reliability, infrastructure, evaluation, and MCP. The writeup described “agent engineering” as a runtime problem with intent, memory, planning, authority, control flow, and tools. (vellum.ai) Cloudflare also said it is working on the MCP TypeScript SDK to make stateless servers the default, another sign that the conversation is moving toward operating model choices rather than demo polish. (youtube.com) The result is a plainer definition of an agent: less a chatbot with clever prompting, more a distributed program that needs bounded permissions, compact tool access, and logs good enough to explain what it just did. (blog.cloudflare.com) (vellum.ai)