Gigascale AI Webinar

- NVIDIA presented a webinar on architecting 'gigascale' AI factories to connect very large GPU pools. - The session discussed scaling up, out, and across networks to support deployments of 100k+ GPUs for efficiency gains. - The framing reinforces why hyperscalers can optimise video inference at scale and why premium acceleration concentrates there (x.com).

NVIDIA is pitching a new way to wire giant AI data centers: treat 100,000 or more graphics processors as one factory, not separate clusters. (nvidia.com) A graphics processing unit, or GPU, is the chip that does most of the math for training and serving AI models. NVIDIA’s current webinar series says modern “AI factories” need coordinated power, cooling, storage, and networking to keep large multi-GPU systems busy during training and inference. (nvidia.com) In the networking session, NVIDIA says the design problem now has three layers: “scale up” inside a rack, “scale out” across many racks, and “scale across” between multiple data centers. The company’s September 9, 2025 networking post says that last layer is meant to let separated facilities run a single training job or disaggregated inference workload together. (developer.nvidia.com) NVIDIA argues distance breaks ordinary Ethernet because long links add delay and jitter, the variation in delivery time that slows tightly synchronized AI jobs. Its Spectrum-XGS Ethernet pitch is distance-aware congestion control and routing that keep performance predictable over longer spans. (developer.nvidia.com) The company put a number on that claim in the same post: up to 1.9 times higher NVIDIA Collective Communications Library all-reduce bandwidth than off-the-shelf Ethernet. All-reduce is the step where many GPUs repeatedly combine partial results, so network stalls can idle expensive chips. (developer.nvidia.com) That framing lines up with where the biggest AI buyers already operate. NVIDIA said on October 28, 2024 that xAI’s Colossus system in Memphis reached 100,000 Hopper GPUs on Spectrum-X Ethernet, and NVIDIA said in October 2025 that Meta and Oracle were standardizing on Spectrum-X Ethernet switches for AI data center networks. (nvidianews.nvidia.com 1) (nvidianews.nvidia.com 2) NVIDIA is also tying that network story to power use. In a January 6, 2026 technical post, it said Spectrum-X Ethernet Photonics on the Rubin platform cuts power per 1.6-terabit-per-second port by 5 times versus off-the-shelf Ethernet and delivers 5 times longer link-flap-free uptime. (developer.nvidia.com) The same Rubin launch materials push the idea further, saying the next Spectrum-X generation is designed for “massive-scale AI factories” and future million-GPU environments. NVIDIA’s March 2026 photonics announcement said its new Spectrum-X and Quantum-X silicon photonics switches are built to connect millions of GPUs across sites while cutting energy use and operating costs. (nvidianews.nvidia.com 1) (nvidianews.nvidia.com 2) The business implication is concentration. If the best economics come from keeping giant pools of GPUs full and synchronized, cloud operators with multiple data centers, custom networking teams, and enough demand to smooth utilization have an advantage over smaller operators buying the same chips. (developer.nvidia.com) (nvidia.com) That helps explain why premium AI inference, including video workloads that need heavy parallel compute and steady throughput, is clustering at hyperscale. NVIDIA’s webinar is less a product launch than a map of where the company expects the next fight to happen: in the network, across ever larger GPU pools. (nvidia.com) (developer.nvidia.com)

Gigascale AI Webinar

Get your own daily briefing