OpenAI shifts observability to ClickHouse

- OpenAI’s own engineers and ClickHouse have now publicly described a petabyte-scale observability stack built on ClickHouse for ChatGPT, research, and API systems. - The telling detail is volume: OpenAI says it ingests petabytes of logs per day, with log growth running above 20% monthly. - That matters because agent products need full traces, not sampled crumbs, and SaaS observability pricing breaks fast at that scale.

Observability is the plumbing that tells an AI company what just happened — which request failed, which tool call went weird, which model run got expensive, and why an agent made the choice it made. That gets much harder when the product is ChatGPT, the API, and a research stack all at once. The news here is that OpenAI and ClickHouse have now put real detail behind something that had mostly circulated as channel chatter: OpenAI’s observability stack is running at petabyte scale on ClickHouse, and the company’s own engineers have talked publicly about how they scaled it. (clickhouse.com) ### What is “observability” here? In plain English, it means logs, metrics, and traces — the records engineers use to reconstruct what a system did. For AI agents, traces matter most. A trace can show the full run: model calls, tool calls, handoffs, guardrails, outputs, and custom spans around the workflow. OpenAI’s Agents SDK has (clickhouse.com)more — it is part of the product surface. (developers.openai.com) ### What did OpenAI actually say? OpenAI engineering manager Akshay Nanavati and MTS engineer Poom Chiarawongse appeared in a ClickHouse session called “Scaling ClickHouse to petabytes of logs at OpenAI.” ClickHouse also published a June 30, 2025 write-up saying OpenAI ingests petabytes of log data every day and that volume is growi(developers.openai.com)se API teams with fast search and high-cardinality tracing. (clickhouse.com) ### Why move this kind of workload to ClickHouse? Because AI observability is a nasty workload. You want to keep everything, query it fast, and slice by weird dimensions like model, tool, tenant, prompt path, or failure mode. That is exactly where columnar systems like ClickHouse tend to shine. ClickHouse’s own observability stack (clickhouse.com)tion, and scaling from small deployments to multi-petabyte workloads. (clickhouse.com) ### So was this a move off Datadog? That part is the squishiest. OpenAI has not published a clean “we replaced Datadog” statement that I could verify. But ClickHouse’s own cost materials use OpenAI as the canonical example of observability spend pressure, even citing a reported $170 million annual Datadog bill. Separately, ClickHouse has been openly positioning itsel(clickhouse.com)ity. So the safe read is not “Datadog is gone.” It is “OpenAI has clearly built major observability capability on ClickHouse because SaaS economics and performance get ugly at this scale.” (clickhouse.com) ### Why do agents make this more important? Because agents create longer, messier execution chains. You do not just need a latency chart. You need an audit trail. OpenAI’s recent engineering posts on Codex and harness systems keep coming back to logs, metrics, traces, and inspectable run records. One post even describes exposing logs, metric(clickhouse.com) if agents are going to do multi-step work, companies need replayable evidence of what happened. (openai.com) ### Why not just sample more aggressively? Because sampling saves money by throwing away the exact weird edge cases you later need to debug. ClickHouse’s observability material makes this tradeoff explicit: legacy pricing pushes teams to keep less data, while columnar storage and object storage let them keep more without the same penalty. For agent systems, that tradeoff gets (openai.com) bad tool decision or a runaway cost loop. (clickhouse.com) ### What is the real takeaway? This is less a vendor-switching gossip item and more a sign of where AI infrastructure is going. As models turn into agents, observability stops being dashboard garnish and becomes core runtime infrastructure — for debugging, cost control, reliability, and auditability. OpenAI’s stack is just the loudest example because the scale is absurd. (clickhouse.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.