Opus beats Haiku in sales

- Anthropic said its week-long “Project Deal” test found Claude Opus 4.5 negotiated better marketplace outcomes than Claude Haiku 4.5 for employees buying and selling goods. - Across 161 items sold in multiple runs, Opus sellers made $2.68 more per sale on average, while Opus buyers paid $2.45 less per purchase. - Anthropic said weaker-model users rated deals similarly despite worse outcomes, pointing to hidden AI quality gaps. (anthropic.com)

Anthropic said its internal “Project Deal” experiment found Claude Opus 4.5 consistently beat Claude Haiku 4.5 at negotiating sales and purchases for employees. (anthropic.com) The company ran a week-long Slack marketplace where 69 employees let Claude agents buy and sell real goods, producing 186 deals worth more than $4,000 across about 500 listings. (anthropic.com) Anthropic also ran a hidden A/B test across parallel versions of the marketplace, assigning some users to Opus 4.5 and others to Haiku 4.5 without telling them until after the survey. (anthropic.com) Across 161 items that sold in at least two of the four runs, an Opus seller earned $2.68 more per sale on average, and an Opus buyer paid $2.45 less per purchase. (anthropic.com) For the same items sold once by each model, Anthropic said Opus sellers got $3.64 more on average than Haiku sellers. In one example, a lab-grown ruby sold for $65 with Opus and $35 with Haiku. (anthropic.com) Anthropic said Opus users also completed 2.07 more transactions on average than Haiku users, with a p-value of 0.001 in its analysis. (anthropic.com) The company said the gap was not obvious to participants. In post-experiment surveys, users represented by Haiku rated the fairness of their deals about as highly as users represented by Opus. (anthropic.com) Anthropic framed the result as a warning about AI agents handling real-world commerce: stronger models can quietly shift prices and outcomes even when people do not notice the difference. (anthropic.com) Project Deal followed Anthropic’s earlier “Project Vend” office-store test, another internal marketplace experiment used to study how Claude agents behave in economic settings. (anthropic.com) The result was simple: in the same marketplace, the more capable Claude model usually made or saved more money, and the people using the weaker one often could not tell. (anthropic.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.