Datadog devs warn agents drift
- Datadog’s developer account circulated a thread arguing that AI agents usually degrade gradually in production, not through obvious crashes, and need continuous observability. - The company’s June 10, 2025 launch tied that pitch to AI Agent Monitoring, which maps decision paths, tool calls, loops, latency, cost and errors. - Datadog has been expanding LLM Observability as agent frameworks spread across production software. (datadoghq.com)
Datadog’s developer account pushed a simple warning: AI agents usually do not break in one obvious moment; they drift in production over time. (x.com) The company has been building that argument into product launches for months. On June 10, 2025, Datadog announced AI Agent Monitoring, LLM Experiments and an AI Agents Console inside its LLM Observability suite. (datadoghq.com) Datadog said the monitoring product maps each agent’s decision path, including inputs, tool invocations, calls to other agents and outputs, in an interactive graph. It also lets engineers inspect latency spikes, incorrect tool calls, infinite loops, security signals and cost metrics. (datadoghq.com) An AI agent is software that uses a large language model to choose actions, call tools and hand work to other agents instead of following one fixed script. Datadog says that flexibility creates dynamic decision graphs that are harder to debug than ordinary application flows. (datadoghq.com) In Datadog’s documentation, each request in an LLM application is represented as a trace, with spans for each model call, tool call or agent step. The company says those traces can be used to track latency, token usage, errors, privacy issues and output quality over time. (docs.datadoghq.com) That “over time” piece is central to the drift argument. Datadog’s docs say its Insights feature surfaces outliers across the past week and is meant to catch regressions, performance drifts and unexpected behavior before they turn into larger failures. (docs.datadoghq.com) The company has also tied the same pitch to specific agent frameworks. In a January 23, 2026 post with Google Cloud, Datadog said systems built with Google’s Agent Development Kit can become unpredictable as they plan, loop, collaborate and call tools dynamically. (cloud.google.com) That post listed the kinds of slow-burn problems operators worry about: incomplete outputs, unexpected costs, security risks, bad multi-agent handoffs and planners stuck retrying the same tool. Datadog framed automatic instrumentation as a way to catch those issues without adding manual setup to every workflow. (cloud.google.com) Datadog has been broadening that message beyond one launch. In a December 2, 2025 company post, it said organizations need “mature monitoring tools for every level of the AI stack” as AI-native companies move toward enterprise scale. (datadoghq.com) The thread landed as observability vendors try to define the next production problem in AI. Datadog’s answer is that agents rarely announce failure with a crash; they leave a trail in traces, costs, retries and slightly worse decisions. (x.com) (docs.datadoghq.com)