AWS doubles down on Trainium/Inferentia
AWS continues investing in custom Trainium and Inferentia silicon and AI‑optimized data centers, creating a cloud pricing/performance alternative to GPUs. Hyperscaler custom silicon is emerging as a clear consideration for customers choosing cloud inference stacks. (techtimes.com) (markets.financialcontent.com)
AWS’s Trn3 UltraServers run on a 3nm Trainium3 chip and pack up to 144 Trainium3 processors for as much as 362 FP8 PFLOPs of peak compute, according to AWS. (aboutamazon.com) AWS touts Trn3’s generational gains—up to 4.4x compute vs Trn2, roughly 4x better energy efficiency, and customer-reported training/inference cost reductions of as much as 50%. (aboutamazon.com) Project Rainier is live: AWS says the cluster currently uses nearly 500,000 Trainium2 chips and is already running Anthropic’s Claude workloads while AWS plans to scale toward larger deployments. (aboutamazon.com) That Rainier build included a massive, purpose-built campus in Indiana reported at roughly $11 billion and represents part of AWS’s broader multi‑billion data‑center spending to support large AI customers. (cnbc.com) On inference, AWS’s Inferentia2 (Inf2) instances place 32 GB HBM on each chip, scale to 12 chips per instance for up to 384 GB of shared accelerator memory and 9.8 TB/s total memory bandwidth, per AWS documentation. (aws.amazon.com) AWS documents claim Inf2 delivers up to 4x throughput, up to 10x lower latency versus first‑gen Inferentia, and roughly 30–40% better price‑performance versus comparable GPU‑based EC2 instances. (aws.amazon.com) AWS has folded custom silicon into major vendor tie‑ups: OpenAI struck a large multi‑year cloud purchase deal with AWS worth about $38 billion announced in November 2025, and in February 2026 AWS and OpenAI announced a broader strategic partnership with a separate multi‑billion investment and co‑development plans. (techcrunch.com) AWS is also pairing Trainium3 with specialist hardware for new architectures—on March 13, 2026 AWS and Cerebras announced a “disaggregated inference” integration that pairs Trainium3 prefill processing with Cerebras WSE‑3 wafer‑scale decode stages across AWS data centers. (financialcontent.com) Market signals show mixed adoption: industry trackers and AWS statements describe rapid Trainium demand and strong commercial uptake, while several startups and independent posts say AWS silicon still lags NVIDIA on raw single‑chip speed, SDK maturity, or model coverage for some workloads. (trendforce.com)