Reliability over capability

- Microsoft warned most enterprises have AI agents but that almost none run reliably in production. - The company outlined 'three tiers of agentic AI' and said some workflows should not use agents at all. - That blunt assessment matches community surveys tracing failures to handoff logic, scalability, and alignment rather than base-model capability (techcommunity.microsoft.com) (x.com)

An AI agent is software that can plan steps, call tools, and pass work to other programs. Microsoft said on April 22 that most companies have them, but “almost none” run reliably in production. (techcommunity.microsoft.com) In that post, Microsoft’s Sameer Gangaramani said enterprises are mistaking a deployment problem for a model problem. He wrote that tool calling is now standard across major models, while frameworks such as LangGraph, CrewAI, and Microsoft Agent Framework already handle much of the orchestration work. (techcommunity.microsoft.com) Microsoft split agentic systems into three tiers: deterministic workflows, bounded agents, and multi-agent systems. It also said some jobs should use no agent at all and stay in a traditional workflow engine, because the work is fixed enough to map in advance. (techcommunity.microsoft.com) A deterministic workflow is the software equivalent of a checklist: the system follows prewritten steps and only uses a model for narrow tasks like extraction or classification. Microsoft said that pattern fits processes with stable rules, clear approvals, and low tolerance for drift. (techcommunity.microsoft.com) A bounded agent gets more freedom inside guardrails, like a customer-support worker who can search records and draft actions but cannot improvise outside policy. Microsoft said that tier works when tasks vary, but the system still needs hard limits on tools, memory, and escalation. (techcommunity.microsoft.com) A multi-agent system is closer to a relay team, with separate agents handling research, planning, execution, or review and then handing work off. Microsoft said that setup is useful only when one agent cannot hold all the context or skills, because every handoff adds failure points. (techcommunity.microsoft.com) Microsoft tied that warning to survey data from OutSystems. OutSystems said on April 7 that 96% of organizations are using AI agents in some capacity, but only one in nine has them operating in production at scale; 94% said agent sprawl is increasing complexity, technical debt, and security risk, and 12% said they have a centralized management approach. (outsystems.com) (techcommunity.microsoft.com) Other industry surveys point to the same bottleneck from a different angle. LangChain said in its 2026 survey of more than 1,300 professionals that 57.3% have agents in production, but 32% named quality as a top barrier and 89% said they had added observability, the monitoring layer teams use to see where an agent went wrong. (langchain.com) Cleanlab’s 2025 production survey was even narrower: out of 1,837 engineering and AI leaders, only 95 said they had live agents in production. Cleanlab said fewer than one in three teams were satisfied with observability and guardrail tools, and 63% planned to increase spending on evaluation and monitoring over the next year. (cleanlab.ai) The infrastructure around agents is still moving fast. Google’s Agent2Agent protocol was placed under Linux Foundation governance on June 23, 2025, and Databricks said its 2026 report covers more than 20,000 organizations, including over 60% of the Fortune 500, as companies push from single agents toward multi-agent systems. (linuxfoundation.org) (databricks.com) Microsoft’s point was narrower than the usual race over bigger models. The company said the hard part is deciding when to use an agent, how much freedom to give it, and how to keep the system observable once it leaves the demo and starts handling real work. (techcommunity.microsoft.com)

Reliability over capability

Get your own daily briefing