Datadog finds 69% prompt token use

- Datadog’s 2026 AI engineering report says the hard part of shipping agents is no longer model choice. It’s the hidden machinery around them. - In customer telemetry, 69% of input tokens in agent traces came from system prompts — not user messages — and median token volume more than doubled. - That shifts the bottleneck from model IQ to operations, cost control, and debugging across multi-model, multi-step AI systems.

AI observability sounds like plumbing. But this Datadog report is really about where the money and fragility in modern AI systems actually live. The surprise is that it’s often not the user prompt, and not even the model itself. It’s the orchestration layer — the long system prompts, tool instructions, routing logic, retries, and policy text wrapped around every call. Datadog’s 2026 State of AI Engineering report puts a sharp number on that: in customer agent traces, 69% of input tokens came from system prompts, while teams also sent far more tokens per request than a year ago. (datadoghq.com) ### What is the hidden part here? A lot of AI products no longer make one clean model call. They run an agent loop. That means a request can include a giant system prompt, tool schemas, retrieval context, formatting rules, safety instructions, and follow-up calls after failures. Users see one box. The system underneath is doing distributed-systems work with language to(datadoghq.com)m demos to production engineering — where prompt changes and retrieval changes can move cost, latency, and failure rates without any obvious code deploy. (datadoghq.com) ### Why does 69% matter so much? Because input tokens cost money and add latency. If most of the prompt budget is being burned by hidden instructions, then teams can underestimate where spend is coming from. The expensive part may be the scaffolding, not the actual user request. That also means optimization gets weird. You can’t just tell users to “write shorter prompt(datadoghq.com)ons the user never sees. The report pairs that with another warning sign: average token counts more than doubled for median-use teams and quadrupled for heavy users. (newswire.telecomramblings.com) ### Why are agents making this worse? Agents multiply everything. One model call becomes several. One prompt becomes a chain of prompts. One failure can trigger retries, fallback models, or extra tool calls. Datadog says agent framework adoption nearly doubled year over year, (newswire.telecomramblings.com)th language glued through it. (datadoghq.com) ### So what breaks first? Not necessarily model quality. Datadog’s bigger point is that operational limits are starting to dominate. Around 5% of AI model requests fail in production, and nearly 60% of those failures come from capacity limits. In plain English — even if the model is smart enough, the system around it can still slow down, error out, or collapse under load. (newswire.telecomramblings.com) ### Why can’t normal monitoring catch that? Traditional app monitoring is good at latency, uptime, and error rates. But agent systems need lineage. You need to know which system prompt version ran, which retrieval chunk got injected, which tool call failed, which fallback mode(newswire.telecomramblings.com) That’s the core case Datadog is making for AI observability as its own discipline, not just regular APM with LLM labels on top. (datadoghq.com) ### Is this really a Datadog-specific pitch? Yes — and also yes, the underlying pattern is real. Datadog sells observability, so the report naturally points toward more instrumentation. But the metrics it highlights line up with what production AI teams keep running into: multi-model sprawl, token growth, retries, tool-call complexity, and hard-to-explain failures. Even if you ignore the vendor angle, the operational story lands. (datadoghq.com) ### What should teams take from it? Treat prompts like infrastructure. Version them. Measure them. Trim them. Track per-step cost, not just per-request cost. And watch the hidden context layer first when bills jump or latency drifts. The big lesson here is simple — in agentic AI, the words around the model can matter more than the words from the user. (datadoghq.com) The flashy part of AI is still the model. But the bill, the bugs, and the bottlenecks are moving into orchestration. Datadog’s 69% number makes that visible. (datadoghq.com)

Datadog finds 69% prompt token use

Get your own daily briefing