AI benchmark blow‑up

The ARC‑AGI‑3 benchmark just crushed top LLMs — top models scored under 1% of human expert performance, exposing big gaps in reasoning ability right now. Microsoft says it's also ramping infra — first to deploy NVIDIA Vera Rubin GPUs on Azure — while OpenAI is moving to a multi‑cloud inference strategy (Google + talks with AWS), signaling a fast reshuffle in where large models actually run. (x.com) (x.com) (x.com)

The ARC‑AGI‑3 technical paper was submitted to arXiv on March 24, 2026 and frames the test as an interactive, turn‑based benchmark that demands online exploration, goal inference and multi‑step planning from agents. (arxiv.org) The public ARC Prize leaderboard shows human test‑takers calibrated to a 100% completion rate on the published game set while leading AI entries cluster below 1% — for example Opus 4.6 appears at roughly 0.2% on early scoreboards. (arcprize.org) (toolmesh.ai) Authors and independent analysts trace the shortfall to execution fragility in long interactive episodes — small mistakes in multi‑step actions cascade into failure rather than recoverable reasoning, a failure mode highlighted in the ARC‑AGI‑3 paper and follow‑up commentary. (arxiv.org) (ai-buzz.com) Microsoft CEO Satya Nadella posted on X on March 13–14, 2026 that Azure had “brought up” an NVIDIA Vera Rubin NVL72 system for validation inside a Microsoft lab and Microsoft says those lab validations will precede rollouts into its Fairwater AI superfactory sites. (datacenterdynamics.com) (azure.microsoft.com) NVIDIA’s platform documentation lists the NVL72 rack as 72 Rubin GPUs plus 36 Vera CPUs delivering up to ~3.6 exaflops of NVFP4 inference throughput per rack and using NVLink‑6 fabric with roughly 260 TB/s aggregate interconnect bandwidth. (nvidianews.nvidia.com) (videocardz.com) OpenAI’s infrastructure footprint has shifted to an explicit multi‑cloud posture: Reuters reported a Google Cloud capacity deal finalized in mid‑2025, and OpenAI publicly announced a multi‑year, ~$38 billion infrastructure partnership with AWS on November 3, 2025. (reuters.com via CNBC) (openai.com) Hyperscalers are publicly mapping Rubin deployments into 2026 — Google has said it will be among the first clouds to offer Vera Rubin systems in the second half of 2026 while Azure and other providers are announcing staged rollouts and large GPU fleet expansions that will create multiple competing runtime venues for frontier inference. (datacenterdynamics.com) (nvidianews.nvidia.com)

AI benchmark blow‑up

Get your own daily briefing