Cohere’s 128K Context Push
Cohere launched Command R+ with a 128K context window and highlighted OCI integration for long‑doc RAG and enterprise retrieval use cases — a clear bet on large‑context enterprise workflows. That puts pressure on memory and token‑throughput tradeoffs for inference stacks and nudges customers to evaluate long‑context optimizations. (x.com)
The 08‑2024 refresh of Command R+ advertises roughly 50% higher throughput and about 25% lower latencies versus the prior release while keeping the same hardware footprint. (docs.cohere.com) Public model listings show a roughly 104‑billion‑parameter Command R+ variant being distributed by Cohere Labs. (huggingface.co) Cohere has pushed Command R+ into multiple cloud channels — it’s listed in the Azure AI model catalog, deployable via Amazon SageMaker JumpStart, and supported in OCI Generative AI. (ai.azure.com) Cohere’s pricing page and third‑party model trackers list Command R+ (08‑2024) at about $2.50 per million input tokens and $10.00 per million output tokens for API access. (cohere.com) OCI documentation states on‑demand inferencing responses for hosted Command R+ are capped at 4,000 output tokens, and OCI provides a TP4 “Large Cohere unit” dedicated AI cluster variant with published benchmark metrics. (docs.oracle.com) Azure’s model catalog entry documents structured‑output schemas, grounded‑generation citations, and a JSON tool‑use interface that Command R+ can emit to drive programmatic RAG and tool workflows. (ai.azure.com)