OpenTelemetry thread proposes concrete tracing and agent‑hierarchy patterns for LLM observability
- OpenTelemetry’s generative artificial intelligence documentation now lays out concrete span patterns for model calls, tool execution, and agent workflows, giving developers a standard way to trace large language model applications end to end. - The specs and examples say traces should capture prompt inputs, output messages, token usage, model names, tool names, and parent-child span links so teams can follow one request across agents, retrieval, and tools. - The push reflects a broader effort to standardize artificial intelligence telemetry beyond HTTP latency and status codes, with agent observability and evaluation events still marked as in development. (opentelemetry.io)
A successful HTTP request does not tell you whether a large language model answered correctly, used the right tool, or burned through tokens. OpenTelemetry’s generative artificial intelligence specs are trying to fill that gap with traces that follow the full request path. (opentelemetry.io 1) (opentelemetry.io 2) OpenTelemetry is the open-source framework many infrastructure teams already use to collect traces, metrics, and logs across software systems. Its generative artificial intelligence work extends that model to chat completions, embeddings, tool calls, retrieval, and multi-agent workflows. (opentelemetry.io 1) (opentelemetry.io 2) In OpenTelemetry, a trace is the full path of a request, and a span is one unit of work inside it. Parent and child spans let engineers see whether a model call triggered a retrieval step, a database lookup, or a tool execution, and in what order. (opentelemetry.io) The generative artificial intelligence span conventions say a model-inference span should record details such as provider name, model name, operation name, token counts, finish reasons, and errors. The examples also show optional capture of system, user, and assistant messages when content capture is enabled. (opentelemetry.io 1) (opentelemetry.io 2) That is the practical argument behind the recent OpenTelemetry-focused thread: ordinary application performance monitoring can show a 200 response and low latency while missing a bad answer, an unnecessary tool call, or a costly retry chain. OpenTelemetry’s own generative artificial intelligence blog makes the same case by framing observability around performance, cost, and safety rather than transport success alone. (opentelemetry.io) The newer agent conventions go further by defining spans for agent operations and tool use. OpenTelemetry’s agent spec describes models that can plan tasks, call external tools, and act across multiple steps, which is where trace hierarchies become more useful than flat request logs. (opentelemetry.io) Those conventions also make room for agent structure. An OpenTelemetry proposal on agentic systems describes attributes for tasks, actions, agents, teams, artifacts, and memory, aiming to preserve the relationships among them inside one trace. (github.com) Tool execution is treated as its own span type, with attributes such as `gen_ai.tool.name` and `gen_ai.tool.type`. The semantic-conventions repository says developers should manually instrument tool calls their automatic libraries do not already capture. (github.com) (opentelemetry.io) Evaluation is part of the same push. OpenTelemetry’s generative artificial intelligence events spec includes a `gen_ai.evaluation.result` event, reflecting the idea that correctness checks, safety scores, or human-review outcomes belong in the same telemetry stream as prompts and tool calls. (github.com) The standards are not finished. OpenTelemetry labels the generative artificial intelligence semantic conventions as “Development,” and its 2025 agent observability post says the ecosystem is still fragmented across frameworks and vendors. (opentelemetry.io) (opentelemetry.io) But the direction is clear: if teams want to explain why an agent answered, failed, spent money, or called a tool, they need traces that describe the model interaction itself, not just the web request wrapped around it. (opentelemetry.io) (opentelemetry.io)