Last9 and Datadog push LLM observability
- Last9 on April 28 launched last9-genai, an OpenTelemetry extension for Python apps that groups multi-turn chats, tool calls and workflow costs into one trace. - Datadog said its 2026 AI engineering report drew on thousands of production environments and found 69% of companies now run three or more models. - Vendors are moving past token counts toward agent and workflow tracking as AI systems sprawl. (datadoghq.com)
Observability tools are starting to treat an artificial intelligence session like a whole workflow, not a pile of isolated model calls. (last9.io) (datadoghq.com) Last9 said on April 28 that it launched last9-genai, a Python software development kit that extends OpenTelemetry for large language model apps. The company said the package adds conversation grouping, workflow tracking, and prompt and completion capture. (last9.io 1) (last9.io 2) OpenTelemetry is the common plumbing many teams use to collect traces, metrics, and logs across software systems. Last9 said that standard GenAI instrumentation records spans and token counts, but does not show whole conversations, rolled-up workflow costs, or prompts in dashboards. (last9.io) Datadog is making a similar argument from a larger customer base. In its 2026 State of AI Engineering report, the company said it analyzed data from thousands of AI agent environments and more than a thousand customers using large language model telemetry. (datadoghq.com 1) (datadoghq.com 2) That report says AI systems in production now span model fleets, orchestration frameworks, tool calls, retries, long prompts, and multiple service boundaries. Datadog said 69% of companies now use three or more models alongside more complex agent workflows. (datadoghq.com 1) (datadoghq.com 2) In plain terms, older observability looks at one request the way a shipping company tracks one package scan. The newer pitch is to follow the entire trip: user message, prompt version, tool hops, retries, model switches, and final outcome. (last9.io) (datadoghq.com) Datadog has already been building product around that idea. The company said last year that its AI Agent Monitoring maps an agent’s decision path, including inputs, tool invocations, calls to other agents, and outputs, and its current docs position the feature for frameworks such as OpenAI Agents SDK, LangGraph, and CrewAI. (datadoghq.com) (docs.datadoghq.com) Last9’s release is narrower, but more explicit about what teams are missing in today’s dashboards. Its documentation says developers can track multi-turn conversations, tool executions, token usage, and an entire user session from first message to final response. (last9.io) (last9.io) The shift reflects where AI deployment has moved since 2024, when many products still wrapped a single model call in an application and called that production. Datadog now describes customers managing fleets, agents, and service boundaries, which turns cost and reliability into workflow-level problems instead of request-level ones. (datadoghq.com) The practical result is that teams are being pushed to measure outcomes such as cost per successful run, failure points in tool handoffs, and which prompt version produced a bad answer. The more agents call tools and other agents, the less useful a raw token count becomes on its own. (last9.io) (datadoghq.com) The new sales pitch from observability vendors is no longer just “watch the model.” It is “trace the whole job,” because that is where production AI systems are now breaking and spending money. (last9.io) (datadoghq.com)