Cerebras pairs CS‑3 with AWS Trainium3

- Amazon Web Services and Cerebras said on March 13 they will bring a joint AI inference service to Amazon Bedrock, pairing AWS Trainium for prompt processing with Cerebras CS-3 systems for token generation. - The companies said the split system is aimed at faster interactive model responses, with Cerebras citing up to 3,000 tokens per second and about 5x more high-speed inference capacity. - The deal puts Cerebras hardware inside AWS data centers and makes AWS the first cloud provider for its disaggregated inference design. (press.aboutamazon.com)

Amazon Web Services and Cerebras said on March 13 they will pair AWS Trainium chips with Cerebras CS-3 systems to run AI inference on Amazon Bedrock. (press.aboutamazon.com) (cerebras.ai) The setup splits one model request into two jobs. AWS Trainium handles “prefill,” the step that reads the prompt, and Cerebras CS-3 handles “decode,” the step that generates each next token. (press.aboutamazon.com) (cerebras.ai) The two companies said the systems are linked with Amazon’s Elastic Fabric Adapter networking inside AWS data centers, and that the service is slated to arrive on Bedrock “in the coming months.” (press.aboutamazon.com) (businesswire.com) Prefill and decode stress hardware differently. Prompt processing rewards dense compute and memory for a big upfront pass, while token generation depends on repeating very small steps with low delay. (cerebras.ai) (press.aboutamazon.com) Cerebras said that division lets each chip do the work it is built for. The company said its systems already power models from OpenAI, Cognition, and Meta at up to 3,000 tokens per second, and that the AWS pairing should deliver about 5x more capacity for high-speed inference. (cerebras.ai 1) (cerebras.ai 2) AWS said it will later offer leading open-source models and Amazon Nova on Cerebras hardware. The companies also said AWS is the first cloud provider for Cerebras’s disaggregated inference service, with Bedrock as the launch venue. (press.aboutamazon.com) (cerebras.ai) The announcement also gives AWS another use for its in-house AI silicon. AWS says Trainium is a family of accelerators for training and inference, and says Trainium3 is its fastest generation so far. (aws.amazon.com 1) (aws.amazon.com 2) For customers, the pitch is not a new model but a different way to serve one. Instead of running the full request on one class of processor, Bedrock would route the slow, bulky first pass to Trainium and the latency-sensitive token stream to CS-3. (cerebras.ai) (press.aboutamazon.com) That makes this a cloud infrastructure deal as much as a chip deal. AWS gets a specialized inference option inside Bedrock, and Cerebras gets placement in one of the largest public-cloud AI platforms without asking customers to leave AWS. (press.aboutamazon.com) (cerebras.ai) The companies have not yet given a public Bedrock launch date beyond “coming months.” But they have drawn a clear line around the product: Trainium for prefill, CS-3 for decode, sold through AWS’s managed AI stack. (press.aboutamazon.com) (cerebras.ai)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.