GPT‑5 throughput numbers

New model specs show GPT‑5 (high) achieving ~65.5K tokens/sec while ChatGPT‑5 clocks ~120K tokens/sec, positioning these builds as enterprise-grade low‑latency options. The published numbers are framing GPT‑5 as a go‑to choice for high‑throughput chat and automation services. ( )

Dataconomy’s AI Model Leaderboard publishes per-model throughput and latency metrics on its model pages and ranks GPT‑5 family variants alongside price and intelligence scores. OpenAI’s public materials describe a GPT‑5 family with multiple routed snapshots (gpt‑5‑main, gpt‑5‑thinking and lightweight nano/mini builds) and expose a reasoning.effort knob that changes inference behaviour across those variants. Independent benchmarking outlets that ran API-level tests report substantial throughput differences between GPT‑5 variants and ChatGPT builds and place GPT‑5 family models in the top tier for output tokens/sec while noting variant- and config-dependent spread. (artificialanalysis.ai; datastudios.org/post/speed-comparison-how-fast-is-chatgpt-with-gpt-5-versus-other-leading-ai-models-in-2025) OpenAI’s developer documentation lists GPT‑5 Chat as a ChatGPT snapshot with separate pricing and rate-limit regimes, and the published API model pages call out distinct speed/price tradeoffs for the family that developers must account for in production. Benchmark variance reported across testers is explained in part by inference-stack choices—speculative decoding, dynamic batching in vLLM-style servers, and specialized silicon —and an academic side‑channel analysis specifically ties speculative decoding techniques to measurable per‑iteration throughput behavior. (computertech.co on Cerebras deployment; developers.openai.com/guides/latency-optimization; arxiv.org/abs/2411.01076) Public leaderboards and independent test suites emphasise different operational metrics (tokens/sec, time‑to‑first‑token, context capacity, and cost per token), and Dataconomy’s ranking explicitly mixes throughput with cost and benchmarked “intelligence” scores when framing model suitability for enterprise workloads.

GPT‑5 throughput numbers

Get your own daily briefing