OpenAI Retires GPT-4o, Launches GPT-5.3 on Cerebras Chips

OpenAI has officially retired GPT-4o from ChatGPT, making the GPT-5.2 model line the new default for enterprise and API use. Concurrently, the company launched GPT-5.3-Codex-Spark, a model optimized for coding that runs on Cerebras' WSE-3 chips instead of Nvidia GPUs. This hardware shift signals OpenAI's strategy to diversify its silicon supply chain and is expected to impact pricing and latency for enterprise services.

- The Cerebras WSE-3 chip is built on a "wafer-scale" architecture, using a single, massive 46,225 mm² piece of silicon, compared to the 814 mm² of an Nvidia H100. This design integrates 900,000 AI-optimized cores and 44GB of on-chip SRAM, providing 21 PB/s of memory bandwidth to reduce the latency caused by fetching data from external memory. - OpenAI's reliance on Nvidia GPUs has resulted in significant operational costs; training GPT-4 was estimated to cost over $100 million, with individual H100 GPUs priced between $25,000 and $40,000. This move signals a strategy to mitigate the high costs and supply chain risks associated with a single hardware provider. - The architectural shift targets the "memory wall" bottleneck common in GPU clusters. With GPUs, significant time can be spent idling while waiting for model parameters to be loaded from off-chip High Bandwidth Memory (HBM). The WSE-3's on-chip SRAM keeps compute and memory physically close, a key factor for low-latency inference tasks like real-time code generation. - The WSE-3 offers a peak performance of 125 petaflops for AI workloads from a single chip. For comparison, an Nvidia H100-based system requires a cluster of interconnected GPUs to approach similar performance levels, introducing network latency that Cerebras' on-wafer fabric is designed to eliminate. - This partnership is part of a broader trend of major AI labs seeking hardware diversity. There have been reports of OpenAI exploring custom silicon and engaging with other chipmakers like Groq to secure the massive amounts of compute needed for future models and to reduce dependency on Nvidia, which holds an estimated 95% market share in AI hardware. - While a single Cerebras CS-3 system has a high power draw of around 23,000 watts, its architecture can simplify cluster-level infrastructure. By reducing the need for complex interconnects like InfiniBand and NVLink that link thousands of GPUs, it can lower the overall power consumption and cost associated with networking at scale.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.