AgentOps as MLOps
- Giuliano Liguori argued that 'AgentOps' is the next phase of MLOps, covering planning, memory, monitoring, and governance for scaled agents. - The view treats agents as full‑stack services requiring lifecycle management, observability, and policy controls. - Managing agents like services changes CI/CD, monitoring, and authority boundaries across ML, infra, and product teams (x.com).
Artificial intelligence agents are being treated less like chatbots and more like software services that need their own operations stack. (developers.openai.com) Giuliano Liguori made that case in a post on X, arguing that “AgentOps” extends machine learning operations from model deployment into planning, memory, monitoring, and governance for agents that run multi-step tasks. (x.com) Machine learning operations, or MLOps, already covers testing, release, deployment, and infrastructure management for models in production. Google Cloud and Amazon Web Services both describe it as the discipline that applies DevOps-style automation and monitoring across the machine learning lifecycle. (cloud.google.com) (aws.amazon.com) Agents add a new layer on top of that model stack. OpenAI’s documentation says agents can plan, call tools, collaborate across specialists, and keep state across steps, which means failures can happen in orchestration, tool use, approvals, or memory, not just in model output. (developers.openai.com) That has pushed vendors to package observability for agents as a separate product category. LangChain says its LangSmith platform traces agent behavior, evaluates outputs, and monitors deployments, while Microsoft says Azure AI Foundry now includes built-in AgentOps tools for tracing, evaluation, latency, token use, and safety checks. (langchain.com) (techcommunity.microsoft.com) The shift changes what “shipping” means for an artificial intelligence system. Microsoft’s January 12, 2026 guide on AgentOps describes production agents as systems that must be developed, deployed, monitored, and maintained as long-running services rather than one-off prompts or model endpoints. (techcommunity.microsoft.com) It also changes what teams have to watch in production. LangSmith says traces are often the only record of what an agent did at runtime, and its dashboards track token usage, latency, error rates, cost, and feedback scores across agent runs. (langchain.com 1) (langchain.com 2) Governance is moving closer to the application layer as well. OpenAI’s governance guide says policies can be versioned and deployed alongside applications, and Amazon Web Services defines machine learning governance to include auditability, traceability, and explainability across the end-to-end lifecycle. (developers.openai.com) (docs.aws.amazon.com) That pushes authority across more teams than classic model operations usually did. If an agent can choose tools, retain memory, and trigger actions, product teams set task boundaries, infrastructure teams own runtime reliability, and machine learning teams still own model behavior and evaluation. This division of labor is an inference from how current agent platforms separate orchestration, observability, and governance features. (developers.openai.com) (techcommunity.microsoft.com) (langchain.com) The argument behind “AgentOps” is not that MLOps disappears. It is that once agents start planning, remembering, and acting across systems, the operational job expands from managing models to managing behavior in production. (learn.microsoft.com) (x.com)