Startups standardizing on HGX/GB200
Briefings note that many startups and neoclouds are increasingly standardizing on NVIDIA HGX and GB200 for production inference, using CoreWeave’s adoption as a validation point (hpcwire.com). The market tilt toward HGX/GB200 underscores portability and time‑to‑market tradeoffs versus hyperscaler silicon bets.
CoreWeave announced availability of NVIDIA GB200 NVL72 rack‑scale systems on April 15, 2025, naming Cohere, IBM and Mistral AI as initial customers. (prnewswire.com) CoreWeave said its Blackwell‑accelerated instances can scale to "up to 110,000 Blackwell GPUs" using NVIDIA Quantum‑2 InfiniBand networking. (prnewswire.com) In MLPerf Inference v5.0 submissions, CoreWeave reported an 800 tokens‑per‑second result on Llama 3.1 405B using GB200 instances, and the company claimed a ~2.86x per‑chip throughput increase versus H200 on that workload. (coreweave.com) CoreWeave, NVIDIA and IBM together submitted the largest MLPerf Training v5.0 run on GB200, benchmarking a 2,496‑GPU GB200 cluster and reporting roughly 2× faster training versus comparable Hopper clusters at the same scale. (coreweave.com) The GB200 NVL72 is a liquid‑cooled rack‑scale system built from 36 Grace Blackwell Superchips (72 Blackwell GPUs per rack) connected with a full NVLink domain and designed to operate as a single concerted GPU fabric. (opencompute.org) CoreWeave says its cloud stack—CoreWeave Kubernetes Service, Slurm on Kubernetes (SUNK) and Mission Control—has been optimized for GB200 NVL72 to help customers port workloads quickly and run production inference and training ahead of many hyperscaler rollouts. (prnewswire.com)