OpenAI adopts Cerebras chips for new model

OpenAI is using Cerebras's WSE-3 wafer-scale chips for inference with its new GPT-5.3-Codex-Spark model. The move signifies a potential shift in hardware strategy, exploring alternatives to traditional GPUs for certain large-scale AI workloads.

- The Cerebras WSE-3 chip is a single piece of silicon containing 4 trillion transistors, 900,000 AI-optimized cores, and 44GB of on-chip SRAM. This design avoids the communication bottlenecks found in multi-GPU clusters by keeping an entire model on one chip. - OpenAI's partnership with Cerebras is a multi-year agreement to deploy 750 megawatts of compute capacity, which will be rolled out in phases through 2028. This deal is reportedly for cloud services, meaning OpenAI is renting capacity rather than purchasing the hardware directly. - The GPT-5.3-Codex-Spark model is a smaller, specialized version of GPT-5.3-Codex designed for low-latency, interactive coding tasks where responsiveness is critical. It has a 128k context window and can deliver over 1,000 tokens per second. - While optimized for speed, GPT-5.3-Codex-Spark underperforms the full GPT-5.3-Codex model on certain capability benchmarks like SWE-Bench Pro. This represents a trade-off between performance and latency for specialized applications. - This collaboration is OpenAI's first major production deployment on non-Nvidia hardware, signaling a strategy to diversify its hardware stack for different workloads. While Nvidia remains the core for training, this "latency-first" tier with Cerebras is purpose-built for high-speed inference. - To further reduce latency, OpenAI has optimized its full inference stack by introducing persistent WebSocket connections and streamlining its Responses API. These changes have reportedly cut client/server roundtrip overhead by 80% and time-to-first-token by 50%. - From a hardware perspective, a single Cerebras CS-3 system, powered by the WSE-3, delivers 125 petaflops of peak AI performance. Independent benchmarks suggest the CS-3 can be significantly faster and more power-efficient than Nvidia's Blackwell B200 GPUs for certain inference workloads.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.