NVIDIA customers build AI chips
- Nvidia’s biggest cloud customers, including Amazon, Google and Microsoft, are building inference-focused AI chips in 2026 as demand shifts from training models to serving them. - Nvidia still forecast $78 billion in fiscal first-quarter revenue on February 25, even as AWS, Google Cloud and Microsoft promoted in-house alternatives. - Google said TPU 8i and TPU 8t will become available later in 2026; Nvidia reports quarterly results again on May 28.
Amazon, Google and Microsoft have all advanced in-house AI chip programs in 2026 aimed at running models more cheaply once they are deployed at scale. The push is centered on inference — the work of generating answers, images or recommendations after a model has been trained — where cloud operators say cost and power use matter more than peak flexibility. Nvidia remains the dominant supplier of AI accelerators, but its largest customers are increasingly trying to shift part of that spending onto chips they design themselves. Nvidia told investors on February 25 that it expected fiscal first-quarter revenue of $78 billion, plus or minus 2%, underscoring that the custom-chip push is unfolding alongside continued demand for its products. ### Why are cloud companies building different chips for inference now? Amazon CEO Andy Jassy said in 2025 that “inference will represent the overwhelming majority of future AI cost,” framing the economics that hyperscalers are now trying to address. Inference workloads run continuously in consumer chatbots, coding assistants, search products and enterprise software, which makes electricity, memory movement and hardware utilization central to margins. Google said on April 22 that its eighth-generation TPU line would split into two systems: TPU 8t for training and TPU 8i for inference. Google said the separation reflected diverging infrastructure needs between pre-training and real-time serving, rather than a single chip handling both jobs. Microsoft made a similar argument on January 26 when it introduced Maia 200 as an accelerator “built for inference” and said the chip was designed to improve the economics of token generation. ### Which companies have shown the clearest moves this year? Microsoft on January 26 unveiled Maia 200, a 3-nanometer inference accelerator with 216GB of HBM3e memory and 7 TB/s of bandwidth, according to the company. Microsoft said the chip would be used inside its own services, two years after introducing Maia 100. Google on April 22 presented TPU 8i as its dedicated inference processor and said both TPU 8i and TPU 8t would become available later in 2026. A Reuters report on April 19 also said Google was in talks with Marvell Technology to develop two chips aimed at running AI models more efficiently, citing The Information. Amazon has pushed its Trainium line as both a training and deployment option. On April 20, Amazon said Anthropic would secure up to 5 gigawatts of current and future generations of Trainium chips to train and power advanced AI models, with significant Trainium3 capacity expected to come online this year. ### Does this mean Nvidia is losing business? Nvidia’s February 25 guidance suggests customers are still spending heavily on its hardware. The company reported quarterly revenue of $68.1 billion for the period ended January 25, including $62.3 billion from data center sales, and forecast another step up to $78 billion in the current quarter. That coexistence reflects how cloud providers buy Nvidia systems for broad AI workloads while also funding internal silicon for narrower jobs. Microsoft has not offered Maia 200 as a general cloud rental product, and Google’s TPU program remains tied closely to its own cloud stack. Amazon has promoted Trainium as a lower-cost option for customers willing to optimize around AWS-specific infrastructure. ### What changes when chips are built for one job instead of many? Inference chips are typically designed around repeated model-serving patterns, with memory, interconnect and low-precision compute tuned for steady token generation rather than large-scale training runs. That specialization can lower operating cost if the software stack is written to match the hardware. The trade-off is complexity. A lab or cloud customer that serves models across Nvidia GPUs, Google TPUs, Amazon Trainium and Microsoft Maia must support different kernels, compiler paths and memory profiles. Google’s decision to separate TPU 8i from TPU 8t, and Microsoft’s choice to market Maia 200 specifically for inference, both point to a market where one architecture no longer covers every stage of AI work. ### What should readers watch next? Nvidia is scheduled to report quarterly results on May 28, when investors will look for any change in data center demand or customer concentration. Google said TPU 8i and TPU 8t will become available later in 2026, and Amazon said significant Trainium3 capacity tied to Anthropic is expected to come online this year.