GPT‑5 throughput numbers
New model specs show GPT‑5 (high) achieving ~65.5K tokens/sec while ChatGPT‑5 clocks ~120K tokens/sec, positioning these builds as enterprise-grade low‑latency options. The published numbers are framing GPT‑5 as a go‑to choice for high‑throughput chat and automation services. ( )
Dataconomy’s AI Model Leaderboard publishes per-model throughput and latency metrics on its model pages and ranks GPT‑5 family variants alongside price and intelligence scores. OpenAI’s public materials describe a GPT‑5 family with multiple routed snapshots (gpt‑5‑main, gpt‑5‑thinking and lightweight nano/mini builds) and expose a reasoning.effort knob that changes inference behaviour across those variants. Independent benchmarking outlets that ran API-level tests report substantial throughput differences between GPT‑5 variants and ChatGPT builds and place GPT‑5 family models in the top tier for output tokens/sec while noting variant- and config-dependent spread. (artificialanalysis.ai; datastudios.org/post/speed-comparison-how-fast-is-chatgpt-with-gpt-5-versus-other-leading-ai-models-in-2025) OpenAI’s developer documentation lists GPT‑5 Chat as a ChatGPT snapshot with separate pricing and rate-limit regimes, and the published API model pages call out distinct speed/price tradeoffs for the family that developers must account for in production. Benchmark variance reported across testers is explained in part by inference-stack choices—speculative decoding, dynamic batching in vLLM-style servers, and specialized silicon —and an academic side‑channel analysis specifically ties speculative decoding techniques to measurable per‑iteration throughput behavior. (computertech.co on Cerebras deployment; developers.openai.com/guides/latency-optimization; arxiv.org/abs/2411.01076) Public leaderboards and independent test suites emphasise different operational metrics (tokens/sec, time‑to‑first‑token, context capacity, and cost per token), and Dataconomy’s ranking explicitly mixes throughput with cost and benchmarked “intelligence” scores when framing model suitability for enterprise workloads.