Uber runs 60,000 agent tasks weekly
- Uber engineers said at MCP Dev Summit that the company now runs 60,000-plus AI agent executions each week across more than 1,500 active agents. - The telling detail is the plumbing: one MCP gateway auto-generates tool definitions from 10,000-plus internal service IDLs, then enforces auth, logging, and redaction. - That matters because Uber’s bottleneck is no longer model access — it’s governance, discovery, and reliability at enterprise scale.
Uber’s story is not really “look, we have a lot of agents.” The interesting part is the infrastructure behind them. At MCP Dev Summit, Uber said it is running more than 60,000 AI agent executions a week, with more than 1,500 active agents and monthly AI usage across 90% of its 5,000-plus engineers. But the talk was basically a case study in a different problem: once agents are real, the hard part becomes routing, permissions, observability, and cleanup — not prompting. ### What actually scaled at Uber? The scale number is big, but it helps to unpack it. Uber described three “surfaces” for agents: a no-code Agent Builder, a code-first Agent SDK for teams like grocery and customer support, and coding agents inside tools like Claude Code, Cursor, and its own internal systems. So this is not one chatbot doing 60,000 things. It is a platform feeding many different agents, built by many different teams, across very different workflows. (youtube.com) ### Why does MCP matter here? MCP — Model Context Protocol — is the layer that lets an agent talk to tools in a standard way. The problem at Uber scale is obvious once you say it out loud: thousands of internal services, many teams, many schemas, and no chance you want every new agent team hand-wiring integrations from scratch. Uber’s answer is a centralized MCP gateway and registry that acts like a control plane for those interactions. (youtube.com) ### What does the gateway actually do? This is the most revealing part. Uber said the gateway can crawl internal proto and thrift files, use LLMs to generate MCP descriptions, and expose tools through one unified service. In plain English, it turns a giant mess of internal APIs into something agents can discover and use without every team building a custom wrapper. That is a huge platform bet — more like building roads and traffic lights than building a smarter car. (youtube.com) ### Why isn’t the model the main story? Because once an agent can call tools, bad tool use becomes the failure mode. Uber spent a lot of time on central authorization, PII redaction, blocking mutable endpoints, code scanning, metrics, and tracing. That tells you where the pain is. A frontier model can still do dumb things if it sees the wrong tool, gets too much context, or writes to a system it should only read from. The platform has to narrow choices and watch everything. (youtube.com) ### How do they keep agents from hallucinating actions? Part of the answer is scope control. Uber said it improves reliability by tightening tool selection and using parameter overrides, which reduces bad tool calls. That sounds small, but it is the difference between an agent vaguely “having access” and an agent being boxed into the exact few actions it should take. Enterprise agent design turns out to be less like giving an intern freedom and more like giving a pilot a checklist. (youtube.com) ### Where is the clearest payoff? Coding is the cleanest example. Uber highlighted an internal background agent called Minions that produces about 1,800 code changes per week. Separately, earlier talks around Uber’s developer platform described large time savings for engineering work. But even here, the lesson is not “the agent writes code.” The lesson is that code generation only becomes useful when it plugs into review, testing, and deployment guardrails. (youtube.com) ### What comes next? Uber’s roadmap points at quality and discovery — evaluation metrics, SLA tiers, better tool search, and reusable “skills” with A/B testing. That is what a mature platform team worries about after the demo phase. Not “can the model do a trick,” but “can thousands of people trust this system every day?” ### Bottom line Uber’s 60,000-task number is impressive, but the deeper signal is architectural. (youtube.com) Production agents are becoming an infrastructure problem. The winners may not be the companies with the flashiest model demos. They may be the ones that can make tool access boring, safe, searchable, and reliable.