AWS and Cerebras partner

Published March 17, 2026 by The Daily Scout

AWS and Cerebras teamed up to target faster Bedrock inference using a disaggregated prefill+decode approach. The collaboration positions Cerebras for premium inference tiers and gives cloud customers a high‑throughput alternative to traditional GPU inference.

Why it matters

AWS will deploy Cerebras CS‑3 systems—powered by Cerebras wafer‑scale engines—inside its data centers and make them accessible through Amazon Bedrock. (businesswire.com) Cerebras’s WSE‑3 wafer‑scale chip is reported to pack roughly 900,000 cores, about 44 GB of on‑chip SRAM, and an internal memory bandwidth figure cited at approximately 27 petabytes/second. (siliconangle.com) The production stack pairs AWS Trainium‑powered servers with Cerebras CS‑3 appliances and uses Amazon’s Elastic Fabric Adapter (EFA) for the low‑latency, high‑bandwidth connectivity between the two systems. (press.aboutamazon.com) AWS’s announcement states the integrated service will deliver “an order of magnitude” speed improvement for inference, while vendor materials and early reports claim roughly a 5× increase in high‑speed token capacity versus prior footprints. (press.aboutamazon.com) The offering is slated to become available in “the coming months,” and AWS says it will add leading open‑source LLMs and Amazon’s Nova family on Cerebras hardware later this year. (press.aboutamazon.com) AWS is described in the announcements as the first cloud provider to offer Cerebras’s disaggregated inference solution via a multiyear deployment of wafer‑scale engines in its regions. (businesswire.com)

Key numbers

AWS will deploy Cerebras CS‑3 systems—powered by Cerebras wafer‑scale engines—inside its data centers and make them accessible through Amazon Bedrock.
(businesswire.com) Cerebras’s WSE‑3 wafer‑scale chip is reported to pack roughly 900,000 cores, about 44 GB of on‑chip SRAM, and an internal memory bandwidth figure cited at approximately 27 petabytes/second.
(siliconangle.com) The production stack pairs AWS Trainium‑powered servers with Cerebras CS‑3 appliances and uses Amazon’s Elastic Fabric Adapter (EFA) for the low‑latency, high‑bandwidth connectivity between the two systems.

What happens next

AWS will deploy Cerebras CS‑3 systems—powered by Cerebras wafer‑scale engines—inside its data centers and make them accessible through Amazon Bedrock.
(press.aboutamazon.com) The offering is slated to become available in “the coming months,” and AWS says it will add leading open‑source LLMs and Amazon’s Nova family on Cerebras hardware later this year.
(businesswire.com) AWS and Cerebras teamed up to target faster Bedrock inference using a disaggregated prefill+decode approach.

Sources

Quick answers

What happened in AWS and Cerebras partner?

AWS and Cerebras teamed up to target faster Bedrock inference using a disaggregated prefill+decode approach. The collaboration positions Cerebras for premium inference tiers and gives cloud customers a high‑throughput alternative to traditional GPU inference.

Why does AWS and Cerebras partner matter?

AWS will deploy Cerebras CS‑3 systems—powered by Cerebras wafer‑scale engines—inside its data centers and make them accessible through Amazon Bedrock. (businesswire.com) Cerebras’s WSE‑3 wafer‑scale chip is reported to pack roughly 900,000 cores, about 44 GB of on‑chip SRAM, and an internal memory bandwidth figure cited at approximately 27 petabytes/second. (siliconangle.com) The production stack pairs AWS Trainium‑powered servers with Cerebras CS‑3 appliances and uses Amazon’s Elastic Fabric Adapter (EFA) for the low‑latency, high‑bandwidth connectivity between the two systems. (press.aboutamazon.com) AWS’s announcement states the integrated service will deliver “an order of magnitude” speed improvement for inference, while vendor materials and early reports claim roughly a 5× increase in high‑speed token capacity versus prior footprints. (press.aboutamazon.com) The offering is slated to become available in “the coming months,” and AWS says it will add leading open‑source LLMs and Amazon’s Nova family on Cerebras hardware later this year. (press.aboutamazon.com) AWS is described in the announcements as the first cloud provider to offer Cerebras’s disaggregated inference solution via a multiyear deployment of wafer‑scale engines in its regions. (businesswire.com)

AWS and Cerebras partner

What happened

Why it matters

Key numbers

What happens next

Sources

Quick answers

What happened in AWS and Cerebras partner?

Why does AWS and Cerebras partner matter?

Get your own daily briefing