Datadog temper framework discussion

- Datadog’s “Temper” framework surfaced publicly in May 2026 through a conference session and social discussion about constrained AI tooling for production engineering. - Sesh Nalla, Datadog’s VP of Engineering, said 90% of engineers adopted AI coding tools in four months, with Claude Code driving two-thirds. - The next primary source is Datadog and Anthropic’s May 6, 2026 “Code w/ Claude” session featuring Nalla.

Datadog’s “Temper” framework entered wider public discussion this month through a social thread that tied it to a broader push toward constrained, auditable AI systems in software operations. The thread pointed readers to a May 6 conference session in which Sesh Nalla, Datadog’s vice president of engineering, described Temper as “a constrained framework” for producing “secure, reusable tools” from AI coding sessions rather than letting one-off agent outputs accumulate. Datadog has not, in the material reviewed, published a standalone Temper manifesto or product page. But its recent writing and talks describe the same operating problem: AI systems can generate code and workflows faster than teams can verify them, pushing the bottleneck from creation to trust. In a March 9 Datadog post, Alp Keles, Jai Menon, Sesh Nalla and Vyom Shah wrote that “the bottleneck has moved from writing code to trusting what was written,” and said the company’s answer is “harness-first engineering.” (claude.com) ### Where did the Temper discussion come from? A May 6 “Code w/ Claude 2026” session in San Francisco is the clearest primary source tying Datadog to Temper by name. The session page says Nalla explained how Datadog built Temper after reusable AI-produced tools for “verification, debugging, orchestration” began to “sprawl into unmaintainable one-offs.” It describes Temper as a framework meant to make those tools secure, reusable and compounding across teams. (datadoghq.com) The social discussion that followed appears to be summarizing that talk rather than quoting a formal Datadog paper. That matters because some phrases circulating online — including references to state machines and policy gates — are better understood as a synthesis of Datadog’s published approach than as language directly lifted from a Temper specification. (claude.com) ### What is Datadog actually saying about how these systems should work? Datadog’s March 9 blog post lays out the company’s most detailed public framework for verifying AI-built systems. The authors say the pattern is: “The agent generates code, the harness verifies it, production telemetry validates it,” and then feedback updates the harness for the next iteration. They list deterministic simulation testing, formal specifications, shadow evaluation and observability-driven feedback loops as the main verification methods. (claude.com) Datadog’s 2026 State of AI Engineering report places that approach in a production context. The report says teams are moving beyond single model calls into systems with orchestration frameworks, tool calls, retries and multiple service boundaries, and argues that the gap between a demo and a dependable system is closed by “effective evaluation and operational discipline.” It says more than 70% of organizations in Datadog’s dataset now use three or more models. (datadoghq.com) ### How do “policy gates” fit into this? Datadog’s February 3 post on AI Guard shows how the company is implementing runtime controls around agents. The post says AI Guard evaluates prompts, responses and tool calls in real time to determine whether an action aligns with organizational intent and policy, and can block requests before they reach sensitive systems. Each allow-or-block decision is tagged with reasons such as data exfiltration or indirect prompt injection. (datadoghq.com) That is the clearest public evidence for the “policy gate” framing around Temper. Datadog does not use that exact phrase in the AI Guard post, but the mechanism it describes — runtime checks on tool use and outputs, with explicit reasons for enforcement — matches how engineers often describe policy gating in production systems. ### Why are people connecting Temper to state machines and constrained agents? (datadoghq.com) Datadog’s own research defines agents as systems with “multi-step control flow, tool execution, or multiple service calls.” That is the class of software where teams often replace open-ended autonomy with explicit stages, allowed transitions and verification at each step. The company’s writing repeatedly favors bounded workflows over free-form execution: harnesses before trust, telemetry before rollout, and runtime checks before tool access. (datadoghq.com) The result is that Temper is being read as part of a larger industrial pattern in AI engineering: less emphasis on unconstrained “agent” behavior, more emphasis on reusable components, formal checks, policy enforcement and audit trails. The next public source to watch is Datadog or Anthropic material from the May 6 session, where Nalla presented Temper directly. (claude.com) (datadoghq.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.