OpenAI enhances coding model, diversifies hardware
OpenAI has released GPT-5.3 Codex, a new model purpose-built for code generation that achieved a 77.3% score on the 'terminal bench' benchmark and is 25% faster than its predecessor. Separately, the company is diversifying its hardware stack through a partnership with Cerebras and a collaboration with NVIDIA that nearly doubled the output of one of its models.
- Beyond the 'terminal bench' score, GPT-5.3 Codex showed significant gains on agentic benchmarks, jumping 26.5 percentage points over its predecessor on OSWorld-Verified, a test that measures the ability to complete tasks using a mouse and GUI apps. - The partnership with Cerebras is a multi-year, $10 billion+ cloud services deal for 750MW of compute, not a hardware purchase, with deployment beginning in 2026. The goal is to provide a dedicated low-latency inference solution by leveraging Cerebras's Wafer-Scale Engine (WSE) architecture, which uses massive on-chip SRAM to mitigate memory bottlenecks common in GPU and HBM setups. - The collaboration with NVIDIA focused on accelerating the open-weight `gpt-oss-120b` model, which uses a Mixture-of-Experts (MoE) architecture. By using TensorRT-LLM and a technique called "disaggregated serving," they achieved up to 1.5 million tokens per second on a single NVIDIA GB200 NVL72 system. - A key feature of GPT-5.3 Codex is "real-time steering," which allows developers to provide feedback and guidance while the model is in the middle of executing a task, without losing context. - OpenAI engineers used early versions of GPT-5.3 Codex to debug its own training runs and analyze evaluation results, a sign of the model's maturity for internal MLOps workflows. - The NVIDIA optimization effort also demonstrated rapid performance gains, with one collaboration with Artificial Analysis showing a 35% acceleration in the output of the `gpt-oss-120b` model in just one week on a DGX B200 system. - GPT-5.3 Codex is the first model OpenAI has classified as "high capability" under its Preparedness Framework, indicating it was specifically trained to identify and help fix software