OpenAI's Data Agent: Built by 2, Serves Thousands
OpenAI has built a company-wide AI data agent used by thousands of employees with a team of just two engineers. The company claims the key was platform thinking and focusing on outcome-driven design, suggesting the high-leverage approach is replicable.
The internal agent, known as Kepler, was created to navigate OpenAI's complex data environment of over 600 petabytes spread across 70,000 datasets. Before the agent, employees spent hours simply trying to locate the correct data table for an analysis. Powered by OpenAI's own GPT-5.2 and Codex models, the agent is accessible via Slack, a web interface, and directly in IDEs. It handles the complete analytics workflow, from discovering data and writing complex SQL to publishing notebooks and reports, reducing tasks that took days down to minutes. The agent's reliability stems from a six-layer context architecture: it combines schema metadata, human expert annotations, code definitions, institutional knowledge from documents, a memory system for corrections, and live runtime data. This structured approach grounds the AI in the company's specific operational reality. A key efficiency framework in its creation was using AI to build AI; over 70% of the agent's own code was generated by AI models. This methodology allowed the two-person team to deliver a production-grade system in approximately three months. For security and governance, the tool operates on a strict "pass-through" permission model. The agent inherits and enforces the existing data access rights of each individual user, meaning employees can only query data they already have permission to see. This data agent is one piece of a broader internal strategy called "Building OpenAI with OpenAI." The company also deploys other bespoke tools, such as a "GTM Assistant" for sales and "DocuGPT" for contract processing, to embed its own AI into core business functions.