Meta to Deploy Millions of Nvidia Processors

Meta plans to deploy 'millions' of Nvidia processors, a move that signals a continued reliance on external silicon by hyperscalers for their AI arms race. The scale of the purchase suggests that for certain AI workloads, the performance of Nvidia's offerings outweighs the benefits of developing custom in-house ASICs.

- The deal with Meta includes millions of GPUs from Nvidia's current "Blackwell" and upcoming "Rubin" families, as well as its "Grace" and "Vera" CPUs, creating a unified architecture across Meta's on-premise data centers and cloud partner environments. - Nvidia's Blackwell architecture, announced in 2024, features a dual-die chip design with 208 billion transistors and is manufactured using a custom TSMC 4NP process. The B200 GPU based on this architecture offers significant performance improvements over the previous "Hopper" generation. - Hyperscalers like Microsoft, Amazon, and Google are developing their own custom ASICs (Application-Specific Integrated Circuits) to optimize for specific internal workloads, particularly for inference, which is estimated to be 80% of the long-term AI compute demand. This "build vs. buy" decision is a key strategic consideration for these companies, balancing the benefits of tailored hardware against the high performance of general-purpose GPUs from vendors like Nvidia. - Despite the rise of custom silicon, Nvidia maintains a dominant market share, estimated to be between 70% and 95% of the AI chip market, largely due to its mature CUDA software ecosystem and the high performance of its GPUs for cutting-edge AI training. - The cost of training large language models has escalated dramatically, with models like GPT-4 reportedly costing over $100 million in compute resources alone, driving the demand for more powerful and efficient processors. - In addition to GPUs, Meta's partnership with Nvidia includes the large-scale deployment of Nvidia's Grace CPUs, making it one of the first hyperscalers to deploy them as standalone processors at scale. The collaboration also extends to Nvidia's Spectrum-X Ethernet networking technology to enhance data center efficiency. - The new Blackwell architecture introduces features like a second-generation Transformer Engine with FP4 AI capabilities and fifth-generation NVLink, which can scale up to 576 GPUs, crucial for training trillion-parameter AI models. - The broader trend in AI infrastructure is not a simple "build versus buy" choice but a hybrid approach where companies buy commoditized solutions and build custom capabilities for strategic differentiation. This allows them to leverage the performance of off-the-shelf hardware like Nvidia's while optimizing costs for specific, high-volume workloads with their own custom chips.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.