NVIDIA targets agentic inference
- NVIDIA on March 18, 2025 introduced Blackwell Ultra, saying the GB300 NVL72 was built for AI reasoning, test-time scaling inference and agentic workloads. (nvidianews.nvidia.com) - The clearest design signal is memory and interconnect: each GB300 NVL72 rack links 72 GPUs, 36 CPUs and 37 terabytes of memory. (nvidia.com) - Microsoft Azure launched NDv6 GB300 clusters for OpenAI on October 9, 2025; NVIDIA says GB300 is available now. (blogs.nvidia.com)
NVIDIA on March 18, 2025 introduced Blackwell Ultra as its next AI factory platform, positioning the system around reasoning, test-time scaling and agentic AI rather than only conventional model training. The company said the platform includes the GB300 NVL72 rack-scale system and the HGX B300 NVL16 server, and described both as tools for “the age of AI reasoning.” (nvidia.com) (nvidianews.nvidia.com) NVIDIA’s own product language points to the technical priority. The company says GB300 NVL72 is “purpose-built for test-time scaling inference and AI reasoning tasks,” with 2x higher attention performance than Blackwell and 1.5x larger HBM3E memory than its predecessor. (blogs.nvidia.com) That framing matters because “agentic inference” is not a formal NVIDIA product category so much as a description of workloads that keep context alive, call tools, and step through tasks over longer sessions. NVIDIA’s technical blog tied Blackwell Ultra to “long thinking,” saying test-time scaling can require far more compute than a single inference pass and that post-training can require much more compute than pretraining for customized models. (nvidianews.nvidia.com) ### Why does Blackwell Ultra look tuned for long-running AI sessions? The GB300 NVL72 is built as a rack-scale system with 72 Blackwell Ultra GPUs and 36 Grace CPUs connected as one platform. (nvidia.com) NVIDIA says the architecture is designed so models can explore multiple solutions, break requests into steps and sustain large context lengths, which are the mechanics behind reasoning and agent-style systems. NVIDIA also emphasized memory and attention performance over simple peak-token marketing. The company said Blackwell Ultra offers 1.5x larger HBM3E memory and 2x higher attention performance than Blackwell, while Microsoft’s October 2025 launch post for Azure’s NDv6 GB300 cluster said each rack provides 37 terabytes of fast memory and 130 TB/s of all-to-all NVLink bandwidth. (developer.nvidia.com) ### What does NVIDIA say the system is for? Jensen Huang, NVIDIA’s chief executive, said on March 18, 2025 that “reasoning and agentic AI demand orders of magnitude more computing performance.” NVIDIA said Blackwell Ultra was designed to handle pretraining, post-training and reasoning inference on one platform, but the company’s examples leaned toward agentic coding, reasoning models and complex multimodal generation. (nvidianews.nvidia.com) The product page gives more detail on that intended use. NVIDIA says GB300 NVL72 is built for test-time scaling, and says the platform can improve responsiveness and throughput on reasoning workloads compared with Hopper-based systems. (nvidia.com) Those are NVIDIA claims, and they are presented alongside projected-performance caveats on parts of the page. ### Should every buyer move straight to GB300? H100 and B200 remain the more established comparison points for many buyers. NVIDIA’s H100 datasheet describes Hopper as a general-purpose data center GPU for AI, while NVIDIA’s DGX B200 materials position B200 as a unified platform for training, fine-tuning and inference in enterprise AI workloads. (nvidianews.nvidia.com) The practical procurement question is whether a workload actually needs Blackwell Ultra’s rack-scale memory footprint, attention acceleration and system-level interconnect. NVIDIA’s published materials support that case for reasoning models, long-context inference and multimodal systems, but they do not show that every chatbot, fine-tune or batch inference pipeline needs GB300-class hardware. (nvidia.com) That is an inference from NVIDIA’s workload descriptions and product segmentation, not a company statement. ### Where is the first large deployment showing up? Microsoft Azure on October 9, 2025 announced the NDv6 GB300 VM series, calling it the first production cluster of NVIDIA GB300 NVL72 systems at supercomputing scale for OpenAI inference workloads. (resources.nvidia.com) Microsoft said the cluster includes more than 4,600 Blackwell Ultra GPUs and was engineered around memory and networking for reasoning models and agentic AI systems. NVIDIA has also said Microsoft, CoreWeave and Oracle Cloud Infrastructure are deploying GB300 NVL72 systems for low-latency, long-context use cases. The company’s wording again centers on interactive reasoning and coding-style agents rather than commodity inference. (nvidianews.nvidia.com) ### What should readers watch next? NVIDIA’s GB300 NVL72 page says the system is “available now,” while the March 2025 launch materials said Blackwell Ultra products would be available from partners starting in the second half of 2025. Microsoft’s NDv6 rollout for OpenAI is the clearest named deployment so far, and further cloud listings from CoreWeave, Oracle Cloud Infrastructure and other partners will show whether Blackwell Ultra remains a specialized reasoning tier or becomes a broader inference default. (blogs.nvidia.com) (nvidia.com) (blogs.nvidia.com)