Hyperscalers Building Custom Chips
- Hyperscalers are increasingly developing bespoke AI chips, reducing sole reliance on a single GPU vendor. - Reports say Google is in talks with Marvell to build two AI chips, while ARM is selling finished 3nm silicon co-developed with Meta. - This trend shifts attention to packaging, memory topology and foundry capacity as first-order system design constraints ( ).
The biggest cloud companies are designing more of their own artificial intelligence chips, shifting spending away from buying every accelerator off the shelf. (cloud.google.com; aws.amazon.com; newsroom.arm.com) Google has been building Tensor Processing Units, or TPUs, for more than a decade; it said in May 2024 that its sixth-generation Trillium chip delivered 4.7 times the peak compute performance per chip of TPU v5e, while doubling High Bandwidth Memory capacity and bandwidth. (cloud.google.com) Amazon Web Services now sells three generations of Trainium chips, including Trainium3, its first 3-nanometer artificial intelligence chip, with up to 144 gigabytes of HBM3e memory and 4.9 terabytes per second of memory bandwidth. (aws.amazon.com) Arm moved further into finished silicon on March 24, 2026, when it said it would sell its first Arm-designed data-center central processing unit, the Arm AGI CPU, developed with Meta as lead partner and co-developer. (newsroom.arm.com; investors.arm.com) That changes what “chip competition” means in artificial intelligence. The contest is no longer only about who has the fastest graphics processing unit; it is also about who can pair compute cores, memory, networking and software into a full rack that runs large models cheaply. (newsroom.arm.com; cloud.google.com; aws.amazon.com) In plain terms, the accelerator is the engine, High Bandwidth Memory is the fuel tank beside it, and packaging is the plumbing that keeps data moving between them. Google said Trillium doubled both HBM capacity and interchip bandwidth over TPU v5e, while Amazon said Trainium3 raised memory capacity 1.5 times and memory bandwidth 1.7 times over Trainium2. (cloud.google.com; aws.amazon.com) The central processing unit is getting a larger role too. Arm said agentic artificial intelligence systems generate far more tokens and could require more than four times today’s CPU capacity per gigawatt, because CPUs handle coordination, data movement and isolation around the accelerators. (newsroom.arm.com) The custom-chip push did not start this year. Google said it began work on TPU v1 in 2013, and Amazon has already put Trainium1, Trainium2 and Trainium3 into its lineup, with Trn2 and Trn3 server families built around proprietary interconnects. (cloud.google.com; aws.amazon.com) Merchant chips still dominate much of the market, and Arm itself describes the direction of travel as a mix of approaches: NVIDIA platforms, hyperscaler-designed accelerators such as Trainium and TPUs, and hybrid systems that combine custom and outside silicon. (newsroom.arm.com) The practical bottlenecks are now closer to the factory floor than the benchmark chart. If the largest buyers keep building their own silicon, foundry slots, advanced packaging and memory supply will decide how much artificial intelligence compute actually gets deployed. (newsroom.arm.com; cloud.google.com; aws.amazon.com)