NVIDIA Launches Blackwell B300 Inference Chip

NVIDIA has introduced the Blackwell B300, a new inference chip purpose-built for long-context models and agentic AI workloads. The B300 delivers a claimed 30% increase in token throughput compared to previous generations. The hardware is designed to power the next wave of AI agents that require high-throughput, low-latency performance for both cloud and edge-server hybrid deployments.

- The Blackwell architecture is built on a custom TSMC 4NP process and features a dual-die design that links two GPU dies with a 10 TB/s interconnect, allowing them to function as a single unified chip with 208 billion transistors. - The B300 GPU is part of the "Blackwell Ultra" series and is specifically optimized for AI inference; it significantly boosts 4-bit floating point (FP4) compute performance while reducing capabilities in 64-bit precision (FP64), which is more common in traditional scientific high-performance computing (HPC). - It is equipped with 288GB of HBM3e memory, providing 50% more capacity and up to 8 TB/s of bandwidth compared to the B200 model; this is crucial for keeping the large KV caches of long-context models in high-speed memory. - A core feature is the second-generation Transformer Engine, which adds support for new microscaling formats like FP4. This can double the performance and memory efficiency for inference workloads compared to the 8-bit floating point (FP8) precision used in the previous Hopper generation. - The B300 is one component in a larger product family that includes the B100 and B200 GPUs as well as the GB200 "Superchip," a module that combines two Blackwell GPUs with an ARM-based

NVIDIA Launches Blackwell B300 Inference Chip

Get your own daily briefing