AWS and Cerebras partner
AWS and Cerebras teamed up to target faster Bedrock inference using a disaggregated prefill+decode approach. The collaboration positions Cerebras for premium inference tiers and gives cloud customers a high‑throughput alternative to traditional GPU inference.
AWS will deploy Cerebras CS‑3 systems—powered by Cerebras wafer‑scale engines—inside its data centers and make them accessible through Amazon Bedrock. (businesswire.com) Cerebras’s WSE‑3 wafer‑scale chip is reported to pack roughly 900,000 cores, about 44 GB of on‑chip SRAM, and an internal memory bandwidth figure cited at approximately 27 petabytes/second. (siliconangle.com) The production stack pairs AWS Trainium‑powered servers with Cerebras CS‑3 appliances and uses Amazon’s Elastic Fabric Adapter (EFA) for the low‑latency, high‑bandwidth connectivity between the two systems. (press.aboutamazon.com) AWS’s announcement states the integrated service will deliver “an order of magnitude” speed improvement for inference, while vendor materials and early reports claim roughly a 5× increase in high‑speed token capacity versus prior footprints. (press.aboutamazon.com) The offering is slated to become available in “the coming months,” and AWS says it will add leading open‑source LLMs and Amazon’s Nova family on Cerebras hardware later this year. (press.aboutamazon.com) AWS is described in the announcements as the first cloud provider to offer Cerebras’s disaggregated inference solution via a multiyear deployment of wafer‑scale engines in its regions. (businesswire.com)