Nvidia Launches Blackwell Ultra B300 GPU

Nvidia just launched its Blackwell Ultra B300 GPU, packing 288GB of HBM3e memory and hitting 15 PFLOPS of FP4 throughput. The B300 is aimed directly at hyperscale AI and inference workloads, with significant boosts in memory and energy efficiency designed for platforms handling massive video and AI traffic.

The Blackwell B300 Ultra is a mid-cycle refresh to the new Blackwell GPU line, which also includes the B100 and B200 models. The Ultra boosts HBM3e memory to 288GB and increases the maximum Thermal Design Power to 1,400W, up from 1,000W on the B200. The underlying architecture features a dual-die chiplet design with a 10 Terabytes per second interconnect, packing 208 billion transistors. This new GPU is part of a broader platform strategy. The B300 can be integrated into the GB200 Superchip, which pairs two Blackwell GPUs with a single Grace CPU. This "superchip" is the core component of larger rack-scale systems like the GB200 NVL72, which connects 72 Blackwell GPUs and 36 Grace CPUs into what functions as a single, massive GPU. Compared to the previous Hopper generation, the Blackwell architecture offers significant performance gains. It introduces a second-generation Transformer Engine with new FP4 and FP6 precision capabilities for AI inference, delivering up to a 2.5x performance boost over Hopper. The fifth-generation NVLink provides 1.8 TB/s of GPU-to-GPU interconnect bandwidth, double that of the prior generation. For large-scale AI, these architectural improvements translate to tangible gains. A system with 72 Blackwell GPUs can provide up to 30 times faster real-time inference for large language models compared to the equivalent Hopper-based system. Tasks that previously required 256 Hopper GPUs can now run on just 64 Blackwell GPUs without a loss in per-GPU throughput. Nvidia's strategy extends beyond the chip to complete data center systems, shifting focus from component performance to deploying integrated, rack-scale power and cooling infrastructure. This approach is designed for the massive computational demands of training and running trillion-parameter AI models. Cloud providers like Microsoft Azure are already deploying these GB200-powered systems. The architecture is also being deployed in distributed and edge computing environments. Akamai, for instance, is integrating thousands of Blackwell GPUs into its global network of over 4,000 edge locations. This strategy aims to reduce latency for inference tasks by processing AI workloads closer to the end-user, which is critical for applications like real-time fraud detection and autonomous systems.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.