Cursor releases Composer 2.5 benchmarks

- Cursor said on May 18 that Composer 2.5 is now available in Cursor, with higher reliability, better long-running task performance and lower pricing. - Cursor and outside reports said Composer 2.5 scored 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1 after 25x more synthetic training. - CNBC published its 2026 Disruptor 50 list on May 19, ranking Cursor No. 37.

Cursor said on May 18 that Composer 2.5 is now available in its coding platform, describing it as a substantial improvement over Composer 2 in intelligence and behavior. The company said the model is better at sustained work on long-running tasks, follows complex instructions more reliably and is “more pleasant to collaborate with.” Cursor priced the standard version at $0.50 per million input tokens and $2.50 per million output tokens, with a faster default variant at $3.00 and $15.00, respectively. ### What exactly did Cursor release? Cursor’s changelog said Composer 2.5 is a live product release inside Cursor, not a research preview. The company paired the launch with a one-week promotion offering double usage and pointed users to model documentation and an announcement for more details. TestingCatalog reported the release as an upgrade aimed at developers and other users who need advanced coding assistance. (cursor.com) That report said Composer 2.5 is publicly available to all Cursor users and framed the update around reliability, intelligence and longer task handling. ### Where do the benchmark claims come from? (cursor.com) The Decoder reported on May 18 that Composer 2.5 scored 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1. The publication said those results put the model alongside Anthropic’s Opus 4.7 and OpenAI’s GPT-5.5 on the cited benchmarks. (testingcatalog.com) Cursor’s earlier technical report on Composer 2 said the company evaluates models with both public benchmarks and its own CursorBench, which it built from real coding sessions by its engineering team. Cursor said CursorBench includes terse and ambiguous prompts that require changes across many files, and said public benchmarks often do not reflect the work developers actually do. (the-decoder.com) ### What changed under the hood? Cursor’s March technical report said Composer 2 was built from the open base model Kimi K2.5 and then improved with continued pretraining and large-scale reinforcement learning in realistic Cursor sessions. The company said that approach was designed to improve end-to-end agent performance for software engineering tasks. (cursor.com) The Decoder and TestingCatalog both said Composer 2.5 keeps the Kimi K2.5 foundation but was trained on 25 times more synthetic tasks than its predecessor. TestingCatalog also said the update added targeted reinforcement learning with localized textual feedback and behavioral calibration intended to improve long rollouts, coding consistency and instruction-following. (cursor.com) ### How strong is the cost argument? Cursor’s own pricing puts Composer 2.5 well below the rates cited by outside outlets for competing frontier coding models. The Decoder said the model costs $0.50 per million input tokens and $2.50 per million output tokens, and said that translated to less than a dollar per task in its comparison, versus as much as $11 for rivals. (the-decoder.com) TestingCatalog described Composer 2.5 as “up to 10x more efficient than similarly capable models.” That claim and the benchmark comparisons are based on Cursor’s own numbers and outside summaries of those numbers, rather than independently published head-to-head testing by Anthropic or OpenAI. (cursor.com) ### How does this fit into Cursor’s broader moment? CNBC published its 2026 Disruptor 50 list on May 19 and ranked Cursor No. 37. CNBC described Cursor as a startup that popularized “vibe coding,” listed Michael Truell as chief executive and said the company was founded in 2022 and is based in San Francisco. Cursor’s next step is likely to be further model iteration rather than a pause. (testingcatalog.com) The Decoder reported that Cursor is already training a larger successor model from scratch, while Cursor’s Composer 2.5 release is already live in the product and available through current model pricing. (the-decoder.com) (cnbc.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.