Cohere unveils Command R+
Cohere announced Command R+ with 128K context and long‑document / enterprise RAG capabilities — a clear product push into large‑context inference and retrieval workflows. This positions Cohere to compete on longer‑context use cases that drive heavy inference-loads and persistent model state. (x.com)
Cohere stamped the release as command-r-plus-08-2024 and set a per‑run output cap of 4,000 tokens while listing per‑token pricing at $2.50 per 1M input tokens and $10 per 1M output tokens. (cohere.com) (docs.cohere.com) A Cohere Labs research release on Hugging Face publishes Command R+ weights as a 104‑billion‑parameter model under a CC‑BY‑NC license and includes a hosted demo space. (huggingface.co) (huggingface.co) Microsoft added Command R+ to the Azure AI model catalog as part of a Cohere collaboration and highlighted pairing the model with Cohere Embed and Rerank to produce citation‑backed outputs. (microsoft.com) (techcommunity.microsoft.com) AWS co‑authored a SageMaker JumpStart post showing Command R and R+ are deployable for inference across a long list of regions including us‑east‑1, us‑west‑2, eu‑central‑1 and ap‑northeast‑1 for VPC‑isolated production use. (aws.amazon.com) (aws.amazon.com) Cohere’s August 2024 model refresh claims roughly 50% higher throughput and about 25% lower latencies for Command R+ compared with the prior version, achieved without increasing the hardware footprint. (cohere.com) (docs.cohere.com) Community resources on Hugging Face surface 4‑bit quantized builds and bitsandbytes deployment instructions for Command R+, enabling local or cloud self‑hosting for research and private deployments. (huggingface.co) (huggingface.co) Amazon Bedrock and other platform docs show the models accept a structured documents field and can return outputs that reference those documents, supporting citation workflows in production inference. (aws.amazon.com) (docs.aws.amazon.com)