Agent frameworks: real success rates
A 47-deployment snapshot shows big variance in how agent frameworks perform in enterprise settings — LangGraph reported a 73% success rate across deployments while AutoGen hit 39%, suggesting orchestration maturity matters more than demos. The data come from Intuz’s survey of enterprise rollouts and point to integration and operability as the main failure modes. (x.com)
The flashy part of an artificial intelligence agent demo is the conversation. The expensive part in a real company is everything around it: saving state, calling tools, waiting for approval, retrying after errors, and logging what happened when something breaks. (intuz.com) (docs.langchain.com) That is why one 47-deployment snapshot from Intuz is getting attention. In that sample, LangGraph reportedly cleared a 73% enterprise success rate while Microsoft AutoGen landed at 39%, which is a gap big enough to suggest the framework choice changes whether a pilot survives contact with real systems. (x.com) (intuz.com) An agent framework is the scaffolding around a large language model. It handles memory, tool use, orchestration, human review, and observability so a team is not hand-building the plumbing every time it wants an agent to do more than answer one question. (intuz.com) LangGraph is built around stateful workflows, which means it keeps track of where a task is in the same way a package tracker knows whether a box is packed, shipped, or delayed. Its documentation emphasizes durable execution, persistence, human-in-the-loop pauses, and graph-based control over each step. (docs.langchain.com 1) (docs.langchain.com 2) AutoGen started with a different center of gravity. Microsoft described it as a multi-agent conversation framework, and its docs still present agent chat as the high-level entry point, which makes it powerful for collaboration patterns but can leave production teams doing more work around control, tracing, and deployment shape. (microsoft.github.io 1) (microsoft.github.io 2) That difference sounds abstract until you put it inside a company. A customer support agent might need to read a policy file, call a billing system, pause for a manager approval above $500, resume two hours later, and leave an audit trail for compliance. (techcommunity.microsoft.com) (docs.langchain.com) The reason orchestration keeps showing up is that enterprise failures usually happen at the seams. The model may answer well in a sandbox, but the rollout dies when identity, permissions, retries, latency, or human approval paths are bolted on after the demo instead of designed in from day one. (learn.microsoft.com) (docs.langchain.com) Observability is one of those unglamorous seams. Microsoft Foundry’s observability docs focus on tracing model calls, tool invocations, latency, token usage, error rates, and task completion because once agents touch live systems, teams need the equivalent of a flight recorder, not just a chat window. (learn.microsoft.com) (github.com) The survey is also landing at a moment when enterprises are deploying more generative artificial intelligence but still struggling to prove value. Gartner said 29% of surveyed organizations had deployed and were using generative artificial intelligence in late 2023, while IBM said 42% of large enterprises were actively using artificial intelligence and another 40% were still exploring or experimenting. (gartner.com) (newsroom.ibm.com) So the useful read on the Intuz numbers is not “one framework wins forever.” It is that enterprise agent work is drifting away from clever chat patterns and toward boring reliability work, where state management, approvals, deployment paths, and monitoring decide whether an agent becomes software or stays a demo. (x.com) (docs.langchain.com)