AWS teams with Cerebras on inference speed

AWS announced a collaboration with Cerebras to deliver accelerated inference performance in the cloud, per CXO Digitalpulse coverage — a concrete hyperscale pairing pushing non‑GPU accelerator options for inference reported. That follows other cloud players experimenting with alternative accelerator suppliers.

The new offering pairs AWS Trainium‑powered servers with Cerebras CS‑3 systems and Amazon’s Elastic Fabric Adapter (EFA) for low‑latency, high‑bandwidth links, according to AWS' announcement. (press.aboutamazon.com) Cerebras and AWS describe the design as “inference disaggregation,” splitting workloads into a parallel “prefill” stage and a serial “decode” stage, with Trainium handling prefill and the CS‑3 focused on decode. (cerebras.ai) AWS’s press release says the Trainium+CS‑3 integration will deliver “an order of magnitude” improvement in inference speed over current options, and several outlets report roughly a ~5x increase in token capacity for decode when using the WSE‑3‑powered CS‑3 for that stage. (press.aboutamazon.com) The CS‑3 is powered by Cerebras’ WSE‑3 wafer‑scale engine, which Cerebras published as a ~4‑trillion‑transistor device with about 900,000 AI‑optimized cores and ~44GB of on‑chip SRAM. (cerebras.ai) AWS said the service will be accessible through Amazon Bedrock in the coming months, and that leading open‑source LLMs plus Amazon’s Nova models will run on Cerebras hardware later this year. (press.aboutamazon.com) The announcement frames the deal as a multiyear collaboration with AWS the first cloud provider to adopt Cerebras’ disaggregated inference solution, and published coverage notes the companies did not disclose financial terms. (press.aboutamazon.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.