NVIDIA customers become competitors

- NVIDIA’s biggest cloud customers, including Amazon, Google, Microsoft and Meta, are increasingly building in-house AI chips for inference and some training workloads, according to company disclosures and product updates published through May 17, 2026. - Google split its latest TPU line into TPU 8t for training and TPU 8i for inference, saying TPU 8i offers 80% better performance-per-dollar for low-latency inference. (cloud.google.com) - NVIDIA’s next public milestone is its June 24, 2026 annual meeting, listed on the company’s investor events page. (investor.nvidia.com)

NVIDIA still dominates the market for AI training systems, but several of its largest customers are now designing more of their own silicon for the workloads that come after training. Amazon Web Services markets Inferentia for generative AI inference and Trainium for training. Google Cloud now separates its newest TPU family into one chip for training and another for inference. Microsoft introduced Maia 200 in January as an inference accelerator, and Meta has expanded its MTIA program for ranking, recommendation and generative AI workloads. (cloud.google.com) NVIDIA’s filings show why that matters. The chipmaker said in its annual report for the year ended January 26, 2025 that an indirect customer represented 10% or more of total revenue, underscoring how concentrated its business has become among a small group of cloud and platform companies. (investor.nvidia.com) NVIDIA reported fiscal 2026 revenue of $215.9 billion on its investor site. ### Why are NVIDIA’s own customers building rival chips? AWS says Inferentia is “designed by AWS” for deep learning and generative AI inference applications, while Trainium is used for training through the same Neuron software stack. (aws.amazon.com) Amazon said in December 2025 that Trn3 UltraServers can scale to 144 Trainium3 chips and are already serving some production workloads in Amazon Bedrock. Google has taken the same route with clearer workload separation. Google Cloud says TPU 8t is built for large-scale pre-training and embedding-heavy workloads, while TPU 8i is optimized for post-training and inference. The company says TPU 8i is aimed at low-latency inference and offers an 80% performance-per-dollar improvement over prior generations for that use case. (sec.gov) ### What changed in the chip market? Microsoft’s January 26, 2026 launch of Maia 200 put the emphasis on deployment economics rather than just raw training capacity. Scott Guthrie, Microsoft’s executive vice president for Cloud + AI, said Maia 200 was “built for inference” and would serve OpenAI models including GPT-5.2 inside Microsoft’s fleet. (aws.amazon.com) Microsoft said the chip delivers 30% better performance per dollar than the latest generation hardware in its fleet today. Meta is making the same distinction in its own infrastructure. Meta said in March that it was developing and deploying four new generations of MTIA chips within two years to support ranking, recommendations and generative AI workloads. (cloud.google.com) In April, Meta said it was expanding a partnership with Broadcom to co-develop multiple generations of next-generation MTIA chips and described its approach as matching “the right accelerator to each workload.” ### Does this mean NVIDIA is losing the core training market? NVIDIA’s public materials still point to a business centered on large-scale AI infrastructure. (blogs.microsoft.com) The company said on its investor site that fourth-quarter fiscal 2026 revenue reached $68.1 billion and full-year revenue rose to $215.9 billion. Jensen Huang said in the fourth-quarter release that customers were “racing to invest in AI compute.” Google and Amazon are not presenting their custom chips as full replacements for every NVIDIA system. Google still describes TPUs as covering training, inference and reinforcement learning, while Amazon says the same Neuron stack supports both Inferentia and Trainium. (about.fb.com) The shift is narrower: hyperscalers are carving out the workloads where custom silicon can lower cost or improve utilization. ### Where does platform control move if customers own more silicon? The software layer is one answer. AWS ties Inferentia and Trainium to Neuron, Google ties TPUs to its cloud stack and Google Kubernetes Engine, Microsoft is previewing a Maia SDK, and Meta says its MTIA roadmap is being co-designed with its internal software and infrastructure. (investor.nvidia.com) That means competition is no longer only about who sells the fastest chip; it is also about who controls the tools, model serving stack and data center deployment path around that chip. NVIDIA’s next scheduled public event is its annual meeting on June 24, 2026, according to the company’s investor calendar. (aws.amazon.com) (investor.nvidia.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.