Google Cloud & NVIDIA tie‑up
Google Cloud and NVIDIA expanded their partnership at GTC to power next‑gen AI workloads, deepening hyperscaler ties that increase fractional and hybrid GPU options. The move keeps cloud‑native paths for startups who don’t want on‑prem lock‑in. (nationaltoday.com)
Google Cloud previewed flexible, fractional G4 VMs that use NVIDIA vGPU technology with the RTX Pro 6000 Server Edition—explicitly calling the fractional G4 preview “a first in the industry” for that server GPU. (cloud.google.com) NVIDIA released Dynamo 1.0 as a production-ready distributed inference runtime that the company says can deliver up to ~7x throughput improvements on Blackwell-class GPUs in multi-node inference tests. (developer.nvidia.com) Google Cloud announced direct integration of NVIDIA Dynamo with the GKE Inference Gateway to create a modular control plane for orchestrating inference across Kubernetes clusters. (cloud.google.com) Google Cloud’s A4 VMs and A4X VMs run on NVIDIA HGX B200 and GB200 NVL72 hardware respectively, with Google claiming A4X delivers “over one exaflop of compute per rack” using GB200 NVL72 and Jupiter network fabric. (blogs.nvidia.com) (datacenterdynamics.com) Google Cloud said its G4 family (RTX Pro 6000 Server Edition) is optimized for fine-tuning and serving models in the ~30B–100B parameter range using 4-bit FP4 precision and Google’s peer-to-peer (P2P) communication to cut latency. (cloud.google.com) NVIDIA and Google signaled on‑prem and regulated‑sector plans by bringing Blackwell platforms to Google Distributed Cloud so Gemini and other models can run inside air‑gapped or data‑sovereignty environments. (blogs.nvidia.com) (datacentremagazine.com) Google Cloud and NVIDIA also announced tighter software alignment—enhanced NVIDIA support across Vertex AI Training and Model Garden—and the launch of a dedicated public‑sector AI startup accelerator program to onboard regulated‑market founders. (cloud.google.com)