NVIDIA Unveils Blackwell B300 'Ultra' GPU

NVIDIA has launched its next-generation Blackwell Ultra B300 GPU, purpose-built for hyperscale AI. The new chip boasts 288GB of HBM3e memory and 15 PFLOPS of FP4 performance, setting a new bar for LLM training and inference infrastructure.

The Blackwell architecture is physically composed of two reticle-limited dies fused together into a single GPU. This dual-die design packs 208 billion transistors and is connected by a 10 terabytes per second (TB/s) chip-to-chip link, allowing it to function as one unified chip. The manufacturing process is a custom-built TSMC 4NP node, an enhanced version of their 5nm process. CEO Jensen Huang introduced the Blackwell platform as the engine for a "new industrial revolution" driven by generative AI. Major cloud providers and AI companies, including Amazon Web Services, Google, Meta, Microsoft, and OpenAI, are expected to adopt the new architecture for their infrastructure. During the unveiling, Huang emphasized the need for "bigger GPUs" to handle trillion-parameter models, a direct driver for Blackwell's design. Compared to its predecessor, the B300 represents a significant leap. It offers up to 30 times the inference performance of the H100 for large language models while being up to 25 times more energy-efficient. A single B300's 288GB of HBM3e memory can hold a 70-billion parameter model in FP16, a task that would have required sharding across multiple H200 GPUs, which only have 141GB of memory. The individual B300 GPU is just one component of a larger platform. For maximum scale, NVIDIA offers the GB200 NVL72, a liquid-cooled rack system that connects 72 Blackwell GPUs with 36 Grace CPUs to act as a single, massive GPU. This system utilizes the fifth-generation NVLink, which provides 1.8 TB/s of total bandwidth per GPU, enabling seamless communication for models with trillions of parameters. This new generation introduces a second-generation Transformer Engine with enhanced support for FP4 and FP8 data formats. This allows the chip to double the performance and the size of models it can handle by using 4-bit floating-point AI, a critical efficiency gain for both training and inference workloads. The estimated price for a single Blackwell chip is between $30,000 and $40,000, though CEO Jensen Huang noted the focus is on the total cost of data center integration rather than individual chip sales. Systems based on the B200 GPU, a slightly lower-spec model, are available now, with the more powerful B300 Ultra expected to ship in the first half of 2026.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.