5x throughput in Bedrock

AWS Bedrock users reported 5x token throughput by pairing AWS with Cerebras CS‑3 via a prefill/decode split on Trainium/WSE — a performance pattern enterprise inference teams should watch (GT Protocol AI Digest, Tech Fusionist: ).

AWS and Cerebras announced a multiyear collaboration on March 13, 2026 to deploy Cerebras CS‑3/WSE‑3 systems inside AWS data centers and surface the service through Amazon Bedrock later in 2026. (press.aboutamazon.com) The architecture described by Cerebras routes prefill work to AWS Trainium servers and sends the resulting key‑value cache across Amazon’s Elastic Fabric Adapter to the Cerebras wafer‑scale engine for decode. (cerebras.ai) Cerebras’ public materials say the WSE‑3 keeps model weights on SRAM on‑chip, producing much higher memory bandwidth than commodity GPUs and enabling decode-focused throughput measured in the thousands of tokens per second vs. hundreds on GPUs. (cerebras.ai) AWS told customers the Bedrock offering will host leading open‑source LLMs and Amazon’s Nova models on Cerebras hardware and characterized availability as “in the next couple of months” from the March 13 announcement. (press.aboutamazon.com) Coverage from SiliconANGLE and DataCenterDynamics frames the deal as AWS building a disaggregated inference stack to compete with GPU‑centric providers, while the companies have not disclosed financial terms. (siliconangle.com) (datacenterdynamics.com) AI newsletters and social posts amplified early test patterns and operator reports after the announcement, prompting secondary outlets to summarize the performance pattern and vendor claims in follow‑up pieces. (smarchunks.com) (aihola.com)

5x throughput in Bedrock

Get your own daily briefing