AWS splits inference silicon

AWS has split inference workloads across multiple vendors — using Trainium for prefill and Cerebras for decode — signaling hyperscalers are engineering multi‑vendor inference stacks argued. The move shows hyperscalers are optimizing pipelines across chips rather than betting on a single-silicon path reported.

AWS and Cerebras announced on March 13, 2026 that Trainium‑powered servers will be paired with Cerebras CS‑3/WSE systems and deployed in AWS data centers, with the service exposed through Amazon Bedrock in the coming months. (press.aboutamazon.com) The companies describe the setup as “inference disaggregation,” where Trainium handles the parallel, compute‑heavy prefill stage and Cerebras’s wafer‑scale engine handles the serial, memory‑bandwidth‑heavy decode stage, interconnected by Amazon’s Elastic Fabric Adapter (EFA). (press.aboutamazon.com) Cerebras points to prior wafer‑scale results of up to ~3,000 tokens/sec on OpenAI’s gpt‑oss‑120B and says the Trainium+WSE pairing will yield roughly a 5× increase in high‑speed token capacity for the same hardware footprint. (cerebras.ai) AWS’s announcement frames the joint system as delivering “an order of magnitude” faster inference for some generative‑AI workloads, and it says the disaggregated offering will be available exclusively via Amazon Bedrock with open‑source LLMs and Amazon Nova running on Cerebras hardware later this year. (press.aboutamazon.com) The move sits atop a large Trainium footprint inside AWS — Project Rainier runs nearly 500,000 Trainium2 chips — and AWS/industry reporting puts major model‑makers on Trainium (Anthropic as a primary Trainium partner and OpenAI planning roughly 2 GW of Trainium capacity through AWS). (aboutamazon.com) Cerebras already offers a Cerebras Fast Inference Cloud product on the AWS Marketplace that advertises throughput exceeding ~2,500 tokens/sec and claims “up to 70× faster than GPUs” for certain models, giving startups an immediate path to try the hardware stack while Bedrock rollout continues. (aws.amazon.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.