OpenTelemetry maps agent traces
- Grafana Labs and Red Hat both published April 2026 walkthroughs showing OpenTelemetry tracing across agent workflows, including tool calls, handoffs, MCP servers, and Llama Stack. - The key shift is standardization: OpenTelemetry now has dedicated GenAI agent span conventions, but they remain in development behind opt-in stability flags. - That matters because teams can finally debug agent failures with existing observability stacks instead of treating multi-step AI behavior like a black box.
OpenTelemetry is basically the plumbing layer that lets engineers see where software spent time and where it broke. That used to mean web requests, databases, and microservices. Now the same idea is getting applied to AI agents — systems that reason in steps, call tools, hand work to other agents, and sometimes wander off in weird directions. The news is that this has moved from a loose hack into something much more concrete: OpenTelemetry now has dedicated GenAI span conventions, and multiple vendors have published fresh walkthroughs showing full agent traces in production-style setups. ### What is a trace here? A trace is the full path of one request through a system, broken into spans — the root operation and all its child steps. In classic software, that might be an API request hitting a service, then a cache, then a database. In an agent system, the same structure can represent a user prompt, the model’s reasoning step, a tool invocation, a guardrail check, and a handoff to another agent. That is why tracing fits agents so well — agents are already multi-step workflows. (opentelemetry.io) ### Why are agents harder to debug? Because the failure usually is not one bad response. The failure is the path. An agent may choose the wrong specialist, call a tool twice, send a huge payload to a slow service, or get stuck bouncing between steps. If all you log is the final answer, you miss the actual mistake. Red Hat’s April 6 walkthrough frames this as an end-to-end visibility problem across routing agents, specialist agents, knowledge bases, MCP servers, and external systems. (opentelemetry.io) ### What changed recently? Two things moved at once. First, OpenTelemetry’s docs now include GenAI-specific semantic conventions, including agent spans, model spans, events, exceptions, and MCP-related conventions. Second, vendors started wiring those ideas into real examples. Grafana’s February post shows the OpenAI Agents SDK exporting traces into Grafana Cloud so teams can inspect generations, tool calls, guardrails, and handoffs in one place. (developers.redhat.com) ### What do the new agent spans add? They give names and structure to operations that used to be ad hoc. OpenTelemetry’s GenAI agent spec defines agent operations as first-class spans, extending the broader GenAI span model. That means a tool call or agent creation step can show up in a standard shape instead of every framework inventing its own labels. The catch is that these conventions are still marked “Development,” so teams often need to opt into the latest experimental version with `OTEL_SEMCONV_STABILITY_OPT_IN`. (opentelemetry.io) ### Why does MCP matter so much? Because MCP is where traces often break. Once an agent crosses a process boundary to call a tool server, you need context propagation or the trace turns into disconnected fragments. Red Hat’s example explicitly traces across application workloads, MCP servers, and Llama Stack. That is the important part — not just seeing one model call, but preserving one continuous trace across the whole tool chain. (opentelemetry.io) ### Where do these traces go? Into the same observability backends teams already use. Grafana pushes them into Tempo-backed Grafana Cloud Traces. Langfuse can ingest OTLP traces directly and map them into its own LLM observability model. That means agent telemetry does not have to live in a separate debugging universe. You can correlate traces with logs, metrics, latency spikes, and infrastructure issues. (developers.redhat.com) ### So what is the real payoff? You stop treating agent behavior like magic. A bad answer becomes inspectable: which model ran, which tool fired, how long each step took, where the handoff happened, and where the path started to drift. It is like going from a final exam grade to the full scratch work. ### Bottom line? OpenTelemetry is turning agent observability into normal software observability. The standards are still settling, but the direction is clear — agent traces are becoming something teams can instrument once and analyze with the tooling they already trust. (grafana.com) (opentelemetry.io)