Rack‑scale AI Factory Notes

NVIDIA also described rack‑scale 'AI factories' using GB300 NVL72 GPUs with NVLink partitioning and topology‑aware scheduling for Slurm/Kubernetes, highlighting the orchestration complexity for high‑density deployments. (x.com).

NVIDIA is sketching out a new problem in artificial intelligence infrastructure: one rack can now behave like a small supercomputer, and the hard part is scheduling it. (developer.nvidia.com) The company’s April 7, 2026 technical post describes Grace Blackwell systems called GB200 NVL72 and GB300 NVL72, each built as a rack-scale unit with 18 compute trays, dense graphics processing unit fabrics, and high-bandwidth networking. NVIDIA says the GB300 NVL72 links 72 Blackwell Ultra graphics processing units and 36 Grace central processing units in one rack-scale design. (developer.nvidia.com; nvidianews.nvidia.com) A rack like that is not just a cabinet full of chips. Supermicro’s GB300 NVL72 datasheet says one rack carries 72 NVIDIA B300 graphics processing units, 288 gigabytes of high-bandwidth memory per graphics processing unit, and up to 800 gigabits per second networking, with each graphics processing unit connected by 1.8 terabytes per second of NVLink. (supermicro.com) NVLink is NVIDIA’s short-range connection for moving data between graphics processing units faster than ordinary network links. In these racks, NVIDIA says the chips are grouped into NVLink domains and partitions, so software has to know which graphics processing units are physically close before it places a training or inference job. (developer.nvidia.com; hpcwire.com) That is where the scheduling story starts. NVIDIA says a flat scheduler that sees only a generic pool of nodes can miss the rack’s hierarchy, while Mission Control, Slurm, Kubernetes tooling, and NVIDIA Run:ai can map jobs to the right NVLink neighborhood using identifiers such as cluster UUID and clique ID. (developer.nvidia.com; run-ai-docs.nvidia.com; slurm.schedmd.com) NVIDIA’s post also introduces Topograph, which it says discovers the rack’s topology and exposes it to schedulers, including Slurm and Kubernetes through Dynamic Resource Allocation and ComputeDomains. Rafay, which wrote about ComputeDomains on April 8, 2026, described the idea as a Kubernetes-native way to allocate and tear down multi-node NVLink communication groups as jobs start and finish. (developer.nvidia.com; docs.rafay.co) This is arriving as NVIDIA pushes the language of “artificial intelligence factories” for data centers built around model training and inference. In March 2025, NVIDIA said the GB300 NVL72 was designed as a single massive graphics processing unit for pretraining, post-training, and reasoning inference workloads. (nvidianews.nvidia.com) Cloud providers have already been turning that hardware pitch into operating procedures. CoreWeave said in August 2025 that its GB300 NVL72 instances used Kubernetes and Slurm on Kubernetes with a topology-aware scheduler that tries to keep workloads inside the same NVL72 domain when possible. (coreweave.com) The practical constraint is density. A GB300 NVL72 rack can concentrate enough compute, memory, networking, power, and cooling in one unit that the bottleneck shifts from buying graphics processing units to orchestrating them without wasting bandwidth or isolating the wrong workloads. (supermicro.com; developer.nvidia.com) So the latest NVIDIA message is less about one more chip launch than about operating a rack as a schedulable machine. The company is arguing that in dense Blackwell systems, the control plane now matters almost as much as the silicon. (developer.nvidia.com; hpcwire.com)

Rack‑scale AI Factory Notes

Get your own daily briefing