AT&T Slashes AI Costs by 90% With New Orchestration

AT&T's chief data officer revealed the company cut its AI operational costs by 90% after its daily workload reached 8 billion tokens. The savings were achieved by restructuring agentic workflows, implementing modular orchestration layers, and optimizing API calls. The case study highlights the critical need for scalable, cost-aware AI architecture in large-scale enterprise deployments.

AT&T's move was a direct response to inference costs spiraling from a workload of 8 billion tokens daily. This massive scale forced a strategic shift away from reliance on a single, large reasoning model toward a more complex, multi-agent system. The company's Chief Data and AI Officer, Andy Markus, has emphasized that this agentic approach is key to moving AI from the "information economy into the action economy." The new architecture is a multi-agent stack built on the LangChain framework. It functions as a modular orchestration layer where "super agents" can direct smaller, specialized "worker" agents for specific tasks. This allows for the use of smaller, fine-tuned language models (SLMs) for dedicated jobs, which provides the necessary accuracy and efficiency for mature agentic solutions. This strategy of owning the orchestration layer is becoming a critical component of enterprise AI architecture. It prevents vendor lock-in and allows for an "interchangeable and selectable" approach to AI models. By controlling how agents are sequenced, what data they access, and how they are governed, companies can encode their own operational DNA into their AI systems. To scale this new architecture internally, AT&T developed a platform called "Ask AT&T" and a graphical, drag-and-drop agent builder known as "Ask AT&T Workflows." This has been deployed to over 100,000 employees, with more than half reporting daily usage. This toolset democratizes development, enabling even non-technical teams to create software prototypes in minutes that might previously have taken weeks. Governance is managed through a formal "Generative AI Transformation Office," which evaluates all use cases and requires a formal business case tied to the CFO's office. The system design incorporates human-in-the-loop checkpoints, with every agent action logged and subject to data isolation, retention policies, and role-based access controls. This shift is part of a broader enterprise trend recognizing that while training costs for models are a significant one-time investment, continuous inference costs at scale are often the larger, ongoing operational expense. AT&T reported that for every dollar invested in generative AI, it returned a 2X ROI in the same year through free cash flow. The agentic workflow has already improved customer experiences by reducing wait times and allowing employees to focus on higher-priority actions.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.