NVIDIA Unveils Blackwell Ultra Chip
NVIDIA has introduced its Blackwell Ultra (GB300 NVL72) platform, which reportedly offers up to 50 times higher throughput for AI inference tasks. The company claims the new architecture provides a 35-fold reduction in cost-per-token compared to the previous Hopper generation. The platform is aimed at accelerating "agentic AI" systems foundational to advanced autonomous robots and real-time assistants.
- The Blackwell Ultra architecture introduces a second-generation Transformer Engine with new 4-bit floating point (FP4) AI inference capabilities, delivering up to 20 petaFLOPS of AI performance per GPU. This new precision format is a key enabler for handling the massive computational demands of trillion-parameter AI models. - A single liquid-cooled GB300 NVL72 rack integrates 72 Blackwell Ultra GPUs and 36 Grace CPUs, functioning as a single, cohesive accelerator. The system boasts a total of 37 TB of high-speed memory with a combined bandwidth of 576 TB/s. - The fifth-generation NVLink interconnect provides 1.8 TB/s of bandwidth per GPU, with a total of 130 TB/s for the entire 72-GPU system, enabling seamless communication for large-scale, multi-GPU AI tasks. This is complemented by ConnectX-8 SuperNICs, which offer 800 Gb/s of network connectivity per GPU for internode communication. - In recent MLPerf benchmarks, a Blackwell Ultra system demonstrated up to 5 times higher throughput per GPU compared to a Hopper-based system on the DeepSeek-R1 reasoning model. For fine-tuning, an eight-GPU Blackwell Ultra setup completed a Llama 2 70B task 5 times faster than an eight-GPU Hopper H100 system. - NVIDIA's strategy for "agentic AI" involves providing the foundational hardware and software, like the NeMo framework and NIM microservices, to build specialized "digital workers". These AI agents are designed to reason, plan, and act on data to perform complex, multi-step tasks. - The concept of "physical AI" is central to NVIDIA's robotics ambitions, where AI systems with physical forms interact with and learn from the real world. The company is developing "world foundation models" (WFMs) to simulate real-world environments for training these embodied AI systems. - In a move that underscores the convergence of AI and robotics, NVIDIA and Foxconn are reportedly planning a humanoid robot assembly plant in Texas to manufacture the GB300 NVL72 rack-scale systems. This initiative will leverage NVIDIA's own Isaac platform for humanoid robots and the GR00T foundation model. - Early adopters of the Blackwell platform for AI agent development include CrowdStrike for cybersecurity, which saw accuracy rise from 80% to 98.5%, and Synopsys for chip design, which reported productivity improvements of 72% in formal verification.