Meta Paying Google for AI Infrastructure

In a major sign of industry consolidation, Meta is reportedly paying Google to run some of its newest generative AI products on Google's next-gen data centers. The collaboration highlights the immense cost and scale required for cutting-edge AI, placing a premium on cloud expertise like GCP and Kubernetes for engineers at both companies.

This multi-billion dollar, multi-year agreement has Meta renting Google's custom AI accelerators, known as Tensor Processing Units (TPUs), to train and run its new AI models. The deal is facilitated by a joint venture Google formed with an investment firm specifically to lease its TPUs to external customers. The partnership is part of a diversified "multi-pronged silicon strategy" for Meta, which also includes a deal for up to $60 billion in AI chips from AMD and a long-term partnership with Nvidia for its next-generation GPUs. This strategy of using multiple suppliers mitigates risk and allows for optimizing different workloads, such as using Nvidia for training and AMD for inference. Meta is simultaneously developing its own custom chips, the "Meta Training and Inference Accelerator" (MTIA) series, but has faced setbacks. The company recently abandoned its most advanced chip design, codenamed "Olympus," and is now focusing on a simpler version, aiming for deployment in 2026 to power recommendation algorithms and generative AI. The enormous cost of AI development drives these strategies, as training a single frontier model like Google's Gemini Ultra can cost an estimated $191 million in compute power alone. Consequently, Meta has dramatically increased its projected capital expenditures for 2026 to between $115 billion and $135 billion, primarily for AI servers and data centers. For Google, this represents a significant validation of its custom hardware, positioning its TPUs as a credible competitor to Nvidia's market-leading GPUs and making TPU sales a crucial growth engine for Google Cloud's revenue. The latest generation of these chips, Trillium, delivers a 4.7x improvement in compute performance per chip over its predecessor. This industry-wide push into AI has created soaring demand for software engineers with specialized skills. Expertise in cloud infrastructure like GCP, container orchestration with Kubernetes, and AI frameworks such as JAX and PyTorch are now critical hiring criteria at major tech firms.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.