Agents produce 14,400 runtime paths

- Engineers building production AI agents are zeroing in on a simple but ugly fact: a few model, tool, and tenant choices explode fast. - The headline number is 14,400 paths — from 3 LLMs × 8 tools × 50 customers × 12 possible tool/model combinations per run. - That matters because agent teams now need runtime guardrails, tracing, approvals, and staged rollouts just to stay in control.

AI agents look simple in demos. A model gets a prompt, calls a tool, and returns an answer. But production agents are not one clean path — they are branching systems with different models, different tools, different customers, and different failure modes. That is why engineers keep landing on numbers like 14,400 runtime paths. The point is not the exact arithmetic. The point is that the path count gets big much faster than most teams expect. ### What is a runtime path? A runtime path is the specific route an agent takes while doing work — which model it used, which tools it called, in what order, under which tenant or customer context, and where it paused, retried, or asked for approval. Modern agent stacks are built around exactly that loop: plan, call tools, observe results, continue. Once you own orchestration yourself, you also own the branching. (langchain.com) ### Where does 14,400 come from? The viral framing uses 3 LLMs, 8 tools, and 50 customers. If a run can combine one of 12 model-and-tool path variants, that becomes 12 × 1,200 customer-specific execution contexts, or 14,400 possible paths. But the deeper lesson is broader — even if your exact multiplier differs, every extra model, tool, approval step, or tenant boundary multiplies the number of situations you may need to test and monitor. (langchain.com) That is the combinatorial explosion people are reacting to. ### Why do customers multiply the problem? Because “the same agent” is rarely the same in production. One customer may have different permissions, data sources, MCP servers, rate limits, compliance rules, or human-review thresholds than another. LangChain’s production runtime guide calls out multi-tenancy as a core runtime requirement, right next to memory, observability, and human oversight. Salesforce is making the same bet from the enterprise side — centralized governance exists because different teams and tenants do not share one risk profile. (anthropic.com) ### Why are tools such a big source of branching? Tools turn a chatbot into an actor. But tools also introduce non-determinism at the edges — bad inputs, partial results, API errors, stale permissions, and ambiguous tool descriptions. Anthropic’s tool-writing guide makes the key distinction: normal software contracts are deterministic, but agent behavior is not. An agent might call the tool, skip the tool, misuse the tool, or ask a clarifying question first. (langchain.com) That means every tool is not just a capability — it is another decision surface. ### Why can’t prompts solve this? Because prompts shape behavior, but they do not inspect the actual path being taken. The March 2026 paper on runtime governance argues that the execution path is the real object you have to govern. Static access control and prompt instructions only cover slices of the problem. If the risk depends on what already happened in the run — what data was accessed, which action is next, which customer is involved — you need runtime checks, not just better prompting. (anthropic.com) ### So what do teams add in practice? They add tracing, resumable execution, human checkpoints, guardrails, and budget-aware routing. OpenAI’s Agents SDK explicitly separates agent definitions from running agents, orchestration, handoffs, tools, and guardrails. LangChain emphasizes durable execution and observability. Salesforce is pushing “guided determinism” and centralized LLM governance. Different vendors use different language, but they are all circling the same problem — too many possible paths to trust blindly. (arxiv.org) ### Does this mean agents are a bad idea? No — it means the hard part has moved. The challenge is no longer just getting an agent to work once. It is getting thousands of slightly different runs to behave well under real permissions, real customers, and real costs. A demo proves capability. Runtime governance proves operability. (langchain.com) ### Bottom line? The 14,400 figure is best read as a warning label, not a law of nature. Once agents mix multiple models, tools, and tenants, the number of runtime paths balloons. That is why the conversation has shifted from “can the agent do the task?” to “can we see, constrain, and safely ship every path it might take?” (langchain.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.