Cohere’s 128K Context Push

Cohere launched Command R+ with a 128K context window and highlighted OCI integration for long‑doc RAG and enterprise retrieval use cases — a clear bet on large‑context enterprise workflows. That puts pressure on memory and token‑throughput tradeoffs for inference stacks and nudges customers to evaluate long‑context optimizations. (x.com)

The 08‑2024 refresh of Command R+ advertises roughly 50% higher throughput and about 25% lower latencies versus the prior release while keeping the same hardware footprint. (docs.cohere.com) Public model listings show a roughly 104‑billion‑parameter Command R+ variant being distributed by Cohere Labs. (huggingface.co) Cohere has pushed Command R+ into multiple cloud channels — it’s listed in the Azure AI model catalog, deployable via Amazon SageMaker JumpStart, and supported in OCI Generative AI. (ai.azure.com) Cohere’s pricing page and third‑party model trackers list Command R+ (08‑2024) at about $2.50 per million input tokens and $10.00 per million output tokens for API access. (cohere.com) OCI documentation states on‑demand inferencing responses for hosted Command R+ are capped at 4,000 output tokens, and OCI provides a TP4 “Large Cohere unit” dedicated AI cluster variant with published benchmark metrics. (docs.oracle.com) Azure’s model catalog entry documents structured‑output schemas, grounded‑generation citations, and a JSON tool‑use interface that Command R+ can emit to drive programmatic RAG and tool workflows. (ai.azure.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.