Two Engineers Power OpenAI's Data Agent

Published by The Daily Scout

What happened

OpenAI's internal AI data agent, which serves thousands of employees, was reportedly built by just two engineers. The company says the pattern is replicable, demonstrating the massive impact small, focused teams can have by building internal AI tools for data queries and code review.

Why it matters

The internal data agent is powered by GPT-5.2 and serves over 3,500 employees, reasoning over a staggering 600 petabytes of data across 70,000 datasets. It's designed to answer complex, open-ended questions in minutes, a process that previously took days of manual exploration. Its accuracy stems from a six-layer context architecture that grounds the AI in organizational reality. The agent uses Codex to analyze data pipeline code, extracting the true meaning and business logic behind datasets directly from the source. This is supplemented by learning from historical query patterns and human annotations that capture specific business semantics. To understand institutional knowledge, the system ingests and embeds information from Slack, Google Docs, and Notion. This allows the agent to grasp the context of launches, internal codenames, and the canonical definitions for key metrics, information that never lives in a database schema. Instead of being a simple query tool, the agent is designed to work like a conversational teammate. It can self-correct when it makes mistakes, such as fixing a bad SQL join without the user ever seeing the error, and it carries context across turns, allowing for iterative analysis. OpenAI built the agent using the same tools it makes available to all developers, including its GPT-5.2 and Codex models, the Embeddings API, and the Evals API. The company emphasizes that the blueprint for creating such high-impact internal tools is therefore replicable outside of OpenAI. This achievement reflects a broader industry shift where AI tools are enabling smaller, more versatile engineering teams to deliver outsized results. The traditional model requiring large teams for complex software is being upended, allowing small groups of AI-augmented engineers to outmaneuver larger, slower-moving competitors.

Key numbers

  • The internal data agent is powered by GPT-5.2 and serves over 3,500 employees, reasoning over a staggering 600 petabytes of data across 70,000 datasets.
  • OpenAI built the agent using the same tools it makes available to all developers, including its GPT-5.2 and Codex models, the Embeddings API, and the Evals API.

What happens next

  • This allows the agent to grasp the context of launches, internal codenames, and the canonical definitions for key metrics, information that never lives in a database schema.

Quick answers

What happened in Two Engineers Power OpenAI's Data Agent?

OpenAI's internal AI data agent, which serves thousands of employees, was reportedly built by just two engineers. The company says the pattern is replicable, demonstrating the massive impact small, focused teams can have by building internal AI tools for data queries and code review.

Why does Two Engineers Power OpenAI's Data Agent matter?

The internal data agent is powered by GPT-5.2 and serves over 3,500 employees, reasoning over a staggering 600 petabytes of data across 70,000 datasets. It's designed to answer complex, open-ended questions in minutes, a process that previously took days of manual exploration. Its accuracy stems from a six-layer context architecture that grounds the AI in organizational reality. The agent uses Codex to analyze data pipeline code, extracting the true meaning and business logic behind datasets directly from the source. This is supplemented by learning from historical query patterns and human annotations that capture specific business semantics. To understand institutional knowledge, the system ingests and embeds information from Slack, Google Docs, and Notion. This allows the agent to grasp the context of launches, internal codenames, and the canonical definitions for key metrics, information that never lives in a database schema. Instead of being a simple query tool, the agent is designed to work like a conversational teammate. It can self-correct when it makes mistakes, such as fixing a bad SQL join without the user ever seeing the error, and it carries context across turns, allowing for iterative analysis. OpenAI built the agent using the same tools it makes available to all developers, including its GPT-5.2 and Codex models, the Embeddings API, and the Evals API. The company emphasizes that the blueprint for creating such high-impact internal tools is therefore replicable outside of OpenAI. This achievement reflects a broader industry shift where AI tools are enabling smaller, more versatile engineering teams to deliver outsized results. The traditional model requiring large teams for complex software is being upended, allowing small groups of AI-augmented engineers to outmaneuver larger, slower-moving competitors.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Published by The Daily Scout - Be the smartest in the room.