Agent observability rules
New guidance is converging on what production agent observability actually looks like — track prompt sensitivity, I/O coverage, token/latency metrics and semantic signals, not just stack traces. That push comes from LangChain’s new observability guide, Respan’s call to move beyond retrospective traces, and OpenRouter’s zero-code Broadcast for broad LLM telemetry. ( )
LangChain’s LangSmith tracing captures full execution trees — every agent run, tool call and model interaction is recorded as a trace for later inspection. (docs.langchain.com) LangSmith’s prebuilt dashboards surface trace counts, error rates, per-trace token breakdowns and LLM-call latency, with token/cost charts available out of the box. (docs.langchain.com/langsmith/dashboards) Resp an (Respan) positions itself as a “self-driving observability” control plane that links observability, automated/human evals and an adaptive gateway to close the loop on agent failures. (respan.ai/blog/introducing-respan) Respan’s public profile lists processing of 1B+ logs and 2T+ tokens per month, serving ~6.5M end users, and the company announced a $5M seed round to scale the platform. (ycombinator.com/companies/respan) (parsers.vc) OpenRouter’s Broadcast feature forwards OpenTelemetry (OTLP) traces from routed LLM requests to observability sinks (e.g., Parseable) without code changes, enabling vendor-agnostic LLM telemetry. (parseable.com/blog/openrouter-broadcast-parseable-llm-observability) Sentry’s integration docs note OpenRouter Broadcast as a beta drain that can ship LLM traces to Sentry via OTLP. (docs.sentry.io/product/drains/openrouter/) The OpenTelemetry GenAI effort and community libraries (OpenLIT/OpenLLMetry) are formalizing semantic conventions and attributes for LLM spans and model metadata to standardize token, prompt and completion telemetry across vendors. (opentelemetry.io/blog/2024/llm-observability/) (github.com/traceloop/openllmetry) LangChain and other vendor write-ups explicitly recommend turning production traces into repeatable test cases and evals so incidents become reproducible signals rather than one-off artifacts. (langchain.com/articles/llm-monitoring-observability) Current platform trade-offs: LangSmith provides ready-made dashboards but notes prebuilt boards are not yet user-modifiable, which affects platform teams that need custom cross-brand SLO views. (docs.langchain.com/langsmith/dashboards) Gateways that remove client instrumentation (OpenRouter Broadcast) shift the monitoring requirement to the routing plane — OpenRouter’s own site documents outages on Feb 17 and 19, 2026 that platform teams will want surfaced in upstream SLAs. (openrouter.ai) (openrouter.ai/announcements) Adoption pattern emerging for enterprise platforms: instrument once with OpenTelemetry semantic conventions, export spans+metrics to a central observability plane (gateway or vendor), and convert captured traces into automated eval test cases for CI. (freecodecamp.org/news/build-end-to-end-llm-observability-in-fastapi-with-opentelemetry/) (langfuse.com/blog/2024-10-opentelemetry-for-llm-observability)