LangSmith ships Engine for evals

- LangChain said on May 13, 2026 that it released LangSmith Engine in public beta, adding automated issue clustering, root-cause analysis and fix proposals. - Ben Tannyhill wrote that Engine watches production traces, clusters failures into named issues, drafts pull requests and proposes evaluators to prevent regressions. - LangChain says teams can try LangSmith Engine now through LangSmith, with repo connection optional and public-beta access available today.

LangChain released LangSmith Engine in public beta on May 13, adding a new layer on top of its LangSmith platform that automates parts of the agent debugging and evaluation cycle. The company said the product watches production traces, groups recurring failures into named issues, diagnoses likely causes against a connected codebase and proposes fixes and evaluators for review. The launch was detailed in a company blog post by Ben Tannyhill and in a YouTube video featuring LangChain chief executive Harrison Chase and Tannyhill. LangSmith had already offered tracing, offline evaluation and online evaluation, according to the product documentation. Engine adds a workflow that starts from live production behavior rather than a manually assembled test plan, LangChain said. The company described the product as working with existing LangSmith tracing projects, evaluator results and repositories, without requiring separate infrastructure. (langchain.com) ### What exactly did LangChain ship on May 13? LangChain said on May 13 that LangSmith Engine is available “today in public beta.” In the launch post, Tannyhill wrote that the product “replaces the manual cycle of reading traces, spotting patterns, and writing fixes” by continuously clustering production failures, diagnosing root causes and drafting pull requests and evaluators for human review. The YouTube launch video described the same workflow in shorter form. (langchain.com) Its description says Engine investigates traces for explicit errors, online evaluation failures, negative user feedback and new behaviors an agent does not yet handle well, then clusters matching cases into a single issue. ### How does this differ from LangSmith’s existing eval tooling? LangSmith’s documentation says the platform already separates evaluation into two modes: offline evaluation for curated datasets before deployment, and online evaluation for production monitoring on live traffic. (langchain.com) Offline evaluation is used for benchmarking, regression testing, unit testing and backtesting, while online evaluation is used to monitor quality in real time and identify issues from production traces. (youtube.com) Engine ties those two modes together more directly. Tannyhill wrote that when Engine surfaces a fix, it also proposes a custom online evaluator and pulls failing traces into an offline evaluation dataset, so the same problem can be checked again after a change ships. ### What kinds of failures is Engine meant to catch? LangChain’s evaluation concepts page says teams should measure specific components such as retrieval steps, tool invocations, output formatting and full agent trajectories, not just final answers. (docs.langchain.com) The documentation gives examples including relevant document retrieval in retrieval-augmented generation systems, correct tool selection and argument formatting in agents, and response quality in chatbots. (langchain.com) The launch post uses a customer-support agent as its example. In that scenario, Engine detects a cluster of traces involving subscription-cancellation requests, sees that online evaluations are failing and user feedback is negative, and surfaces the pattern as a single named issue rather than isolated bad runs. ### Why is tracing central to this product? LangSmith’s main documentation describes the platform as framework-agnostic and built around tracing, evaluation, prompt testing and deployment in one system. (docs.langchain.com) The tracing layer records application behavior, while online evaluations score live interactions and offline evaluations compare versions on datasets. Engine depends on that trace history. Tannyhill wrote that the product plugs into current tracing projects and evaluator results, then starts surfacing issues from production automatically once a team connects a project and, optionally, a repository. (langchain.com) ### Where does this leave teams already using LangSmith? LangSmith’s docs say users can run the platform in managed cloud, self-hosted or hybrid setups, and that the product includes observability, evaluation, prompt engineering and deployment features. (docs.langchain.com) Engine is being introduced into that broader stack rather than as a separate standalone service, based on LangChain’s description. The next step is immediate. LangChain’s launch post says LangSmith Engine can be tried now in public beta, and the company’s video points users to the LangSmith product page and the May 13 launch materials for access and setup details. (langchain.com) (docs.langchain.com)

LangSmith ships Engine for evals

Get your own daily briefing