OpenAI Deploys Model on Cerebras

OpenAI has launched GPT-5.3-Codex-Spark, its first production AI model served on chips from Cerebras Systems, marking its first deployment away from Nvidia. The move is seen by some as the biggest crack in Nvidia's market dominance. Social media posts indicate the model can achieve over 1,000 tokens per second, though some question if its full potential is being utilized by default.

- Cerebras's advantage lies in its Wafer-Scale Engine (WSE), a single chip the size of a silicon wafer. The latest WSE-3, built on TSMC's 5nm process, integrates 4 trillion transistors, 900,000 AI-optimized cores, and 44GB of on-chip SRAM. This design provides massive memory bandwidth (21 PB/s) and avoids the communication bottlenecks of multi-GPU clusters, making it highly efficient for large model inference. - The deployment is a strategic move for OpenAI, which recently launched a 10-year plan to localize its hardware supply chain for data centers, robotics, and consumer hardware. This initiative aims to increase supply chain resilience and control over the physical infrastructure required for large-scale AI. This follows a broader trend of hyperscalers like Google, AWS, and Microsoft developing custom silicon (TPUs, Trainium) to reduce dependency on Nvidia. - While Nvidia dominates the AI accelerator market with an estimated 80-92% market share, its strength is heavily tied to its CUDA software ecosystem, which has been developing for nearly two decades. Competitors like Cerebras offer specialized hardware advantages but must overcome the significant switching costs associated with CUDA's deep integration into AI workflows. - The AI chip landscape is experiencing a surge in venture capital investment, with over $1 billion flowing into the sector in Q4 2025 alone. Global funding for AI startups reached $270.2 billion in 2025, accounting for over half of all VC investments. This influx of capital is fueling new architectures and competition from startups. - This collaboration highlights a critical industry shift from focusing on the economics of model training to the ongoing, operational costs of inference. While training is a significant one-time expense, inference costs accumulate with every user query and can surpass training costs over a model's lifetime. Hardware optimized for inference, like the Cerebras WSE, aims to lower this recurring cost. - The Cerebras CS-3 system, powered by the WSE-3, is designed for cluster-level compute in a smaller footprint, occupying one-third of a standard datacenter rack. For models too large to fit into the 44GB of on-chip SRAM, Cerebras offers a "MemoryX" unit that can store trillions of weights in DRAM and stream them to the processor. - Hyperscalers are aggressively developing their own custom silicon to optimize for their specific workloads and reduce costs. Google's TPU v7 "Ironwood" delivers 4.6 PFLOPS, Amazon's Trainium 3 offers a 50% better price-performance ratio than equivalent Nvidia GPUs for some workloads, and Microsoft has its Maia AI accelerator. This "build vs. buy" trend creates opportunities for specialized hardware providers like Cerebras to partner with AI leaders seeking alternatives.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.