AgentOps Emerges as MLOps Stack for AI Agents
A new MLOps framework called AgentOps is gaining traction for developing and scaling autonomous AI agents. The stack covers the full lifecycle, including planning, memory, execution, monitoring, and governance, aiming to provide the infrastructure needed to move agentic AI into production environments.
AgentOps is the flagship product of Agency, a company founded by Alex Reibman, Adam Silverman, and Shawn Qiu. The founders started by building AI agents to automate financial workflows but found existing tooling inadequate for reliability, which led them to create AgentOps as an internal debugging tool first. The company has since raised $2.6 million in pre-seed funding led by 645 Ventures and Afore Capital. The platform is designed to address the operational complexity of managing autonomous AI systems, a discipline evolving from MLOps and LLMOps. While MLOps focuses on traditional machine learning models and LLMOps handles large language models, AgentOps provides the necessary infrastructure for agents that perform multi-step tasks, use tools, and make decisions. It offers tools for observability, compliance, cost tracking, and performance monitoring to ensure agents operate reliably and securely in production. AgentOps integrates with popular AI agent frameworks including Microsoft's AutoGen, CrewAI, LlamaIndex, and Cohere. This allows developers using these frameworks to add monitoring and analytics to their agentic systems with just a few lines of code. The goal is to provide an audit trail and accountability for AI agents, treating them like new members of the workforce that require management and oversight. The emergence of tools like AgentOps is timely, as enterprises are increasingly moving AI agents from pilot programs to production for use cases like claims processing and underwriting in the insurance sector. However, scaling these systems presents significant challenges, including managing coordination between multiple agents, ensuring data security, and controlling operational costs. A recent survey showed that while over 60% of organizations are experimenting with agentic AI, only about 15-20% have deployed them in production, highlighting the need for robust operational tools.