OpenAI Deploys First Production Model on Non-Nvidia Chips

OpenAI launched its GPT-5.3-Codex-Spark model on Cerebras hardware, marking its first major production AI workload deployed away from Nvidia's GPGPU architecture. The move signals a potential shift in the hardware economics and performance characteristics for large-scale AI inference.

- The deployment leverages Cerebras's third-generation Wafer-Scale Engine (WSE-3), a single dinner-plate-sized chip with 900,000 AI-optimized cores and 44GB of on-chip SRAM. This architecture contrasts with GPU clusters by keeping the entire model on one processor, drastically reducing the latency caused by data movement between multiple chips. - GPT-5.3-Codex-Spark is a smaller, specialized version of OpenAI's Codex model, optimized for high-speed, interactive tasks like real-time code editing. On the Cerebras hardware, it can generate over 1,000 tokens per second with a 128k context window. - This move is part of a multi-year, $10 billion agreement for OpenAI to deploy 750 megawatts of Cerebras compute capacity for low-latency inference, which will be brought online in phases through 2028. - The Cerebras architecture is designed to excel at inference workloads where responsiveness is critical. For a 120 billion-parameter model, the Cerebras CS-3 has been benchmarked at over 2,700 tokens per second, compared to 900 tokens per second on Nvidia's Blackwell B200 GPU. - From a hardware economics perspective, Cerebras claims its systems can offer significant price-performance and power efficiency advantages. One analysis indicated a Cerebras CS-3 system was 32% lower cost and used one-third the power of a comparable Nvidia DGX system while delivering results 21 times faster. - OpenAI has stated that GPUs remain fundamental to its operations for large-scale training and cost-effective, general-purpose inference. The addition of Cerebras creates a complementary, specialized tier for workloads that demand extremely low latency, as part of a broader strategy to build a more resilient and diverse hardware portfolio.

OpenAI Deploys First Production Model on Non-Nvidia Chips

Get your own daily briefing