OpenAI Shifts to Cerebras for New Codex Model

OpenAI has launched GPT-5.3-Codex-Spark, a faster variant of its code model, and is running inference for it on Cerebras WSE-3 chips instead of Nvidia GPUs. The company cited significant improvements in inference speed, lower latency, and reduced operational costs as key reasons for the hardware shift. This marks the first time OpenAI has publicly opted for an alternative to Nvidia at a large scale.

- The multi-year deal between OpenAI and Cerebras is valued at over $10 billion and involves Cerebras providing up to 750 megawatts of power for AI inference, which will be rolled out in stages through 2028. This partnership gives OpenAI a dedicated infrastructure for low-latency tasks, diversifying beyond its reliance on Nvidia and traditional cloud providers. - The Cerebras WSE-3 is a single wafer-sized chip containing 900,000 AI-optimized cores, 4 trillion transistors, and 44GB of on-chip SRAM. This design provides 21 PB/s of memory bandwidth, aiming to reduce latency by minimizing data movement between separate chips, a common bottleneck in multi-GPU setups. - GPT-5.3-Codex-Spark is a text-only model with a 128k context window, specifically tuned for interactive coding. OpenAI claims it can deliver over 1,000 tokens per second for certain configurations, targeting real-time collaboration and tasks where responsiveness is critical. - In a third-party benchmark, the Cerebras CS-3 system demonstrated up to 21 times faster inference performance on a 70B parameter Llama 3 model compared to Nvidia's Blackwell B200 GPU, based on end-to-end latency calculations. For an open-source OpenAI model (gpt-oss-120B), the CS-3 achieved over 2,700 tokens per second versus 900 for the B200. - The new model extends beyond code generation to developer tasks like debugging, deployment, monitoring, and writing product requirement documents (PRDs). It is designed to be interruptible, allowing developers to steer the model mid-task for more interactive workflows. - OpenAI's relationship with Cerebras dates back to meetings that started in 2017, and OpenAI's CEO, Sam Altman, is a personal investor in the chipmaker. The collaboration has previously included tuning earlier open-source GPT models on Cerebras hardware. - The GPT-5.3-Codex-Spark model is initially available as a research preview for ChatGPT Pro subscribers, accessible via the Codex app, command-line interface (CLI), and IDE extensions. During the preview period, usage will not apply to standard rate limits.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.