Google Cloud Previews NVIDIA Blackwell GPUs for Serverless AI
Google Cloud Run now offers a preview of NVIDIA's RTX PRO 6000 GPUs, based on the new Blackwell architecture, for serverless AI inference tasks. The offering allows developers to access the latest GPU hardware without needing to reserve instances. This integration is aimed at scaling AI deployments that may originate from or interact with embedded and edge devices.
- The NVIDIA Blackwell architecture is built on a custom TSMC 4NP process node and features up to 208 billion transistors, a significant increase from the 80 billion in the previous Hopper generation. - This architecture introduces a second-generation Transformer Engine with support for new microscaling formats like 4-bit floating point (FP4), which can double the performance for AI inference tasks while maintaining high accuracy. - The specific GPU offered, the RTX PRO 6000 Blackwell Server Edition, comes with 96 GB of GDDR7 memory, doubling the 48 GB of GDDR6 memory found in its predecessor, the RTX 6000 Ada Generation, allowing for larger and more complex models. - Google Cloud Run mitigates the "cold start" problem typical of serverless platforms, enabling instances with pre-installed NVIDIA drivers to start in approximately 5 seconds. - To utilize the RTX PRO 6000 Blackwell GPU on Cloud Run, a service must be configured with a minimum of 20 vCPUs and 80 GiB of memory. - The pay-per-use model allows developers to leverage this high-end GPU for sporadic, compute-intensive tasks triggered by edge devices, scaling to zero automatically to avoid paying for idle resources. - This serverless approach is well-suited for real-time applications like fraud detection, medical image analysis, or recommendation engines where inference requests from distributed devices can be unpredictable.