Trainium getting traction
Coverage shows AWS Trainium and Inferentia are landing big customers — the chips are gaining traction with names like Anthropic and OpenAI, creating a real cost/lock‑in narrative from cloud vendors. The trend is forcing vendor comparisons on performance, tooling, and portability. ( )
Anthropic expanded its AWS relationship on November 22, 2024 with a new $4 billion agreement that names AWS its primary cloud and training partner and commits Anthropic to co-develop Trainium hardware and Neuron software. (anthropic.com) Amazon and OpenAI announced a multi‑year strategic partnership on February 27, 2026 that includes a $50 billion Amazon investment and a commitment for OpenAI to consume roughly 2 gigawatts of Trainium capacity on AWS. (aboutamazon.com) AWS says Project Rainier — an EC2 UltraCluster deployment — brought nearly 500,000 Trainium2 chips online and that Anthropic’s Claude will run on more than 1 million Trainium2 chips by the end of 2025; TechCrunch reports 1.4 million Trainium chips deployed across all three generations. (aboutamazon.com) AWS has moved toward native PyTorch support for Trainium through its TorchNeuron package and recent Neuron SDK updates (including the Neuron Kernel Interface), enabling eager-mode PyTorch, torch.compile, and user‑written kernels for better portability and kernel-level tuning. (awsdocs-neuron.readthedocs-hosted.com) AWS published Trainium2 performance targets for its Trn2 UltraServer (up to 20.8 PFLOPS dense FP8 per 64‑chip configuration and 83.2 PFLOPS with sparsity) and AWS and some analyst writeups cite a roughly 30–40% price‑performance edge versus comparable H100 P5e instances on many training workloads. (techcrunch.com) Multiple outlets covering AWS’s lab tour and the OpenAI deal note that Anthropic and Bedrock were already consuming Trainium capacity at rates that strained production, a factor that makes the 2‑gigawatt OpenAI commitment a major capacity and supply‑chain bet for AWS. (techcrunch.com)