Anthropic fairness concerns gain traction

- Anthropic said April 24 that its “Project Deal” test let 69 Claude agents negotiate real office trades for employees, completing 186 deals worth just over $4,000. - In a hidden parallel test, Claude Opus 4.5 got better prices and closed more deals than Claude Haiku 4.5, while weaker-agent users rated fairness almost identically. - Anthropic published the results as it pushes “trustworthy agents” rules for AI systems that now make purchases, bookings, and work decisions. (anthropic.com)

Anthropic said on April 24 that its “Project Deal” experiment let Claude agents buy and sell real goods for 69 employees in its San Francisco office. (anthropic.com) The company ran the one-week marketplace in December 2025 through Slack after each participant told Claude what they wanted to buy or sell and got a $100 budget. (anthropic.com) (the-decoder.com) Anthropic said the agents completed 186 deals across more than 500 listings, moving just over $4,000 in goods that included a snowboard and a bag of ping-pong balls. (anthropic.com) (the-decoder.com) The main result came from a second test that participants did not know about. Anthropic compared Claude Opus 4.5, its frontier model at the time, with Claude Haiku 4.5, its smallest model. (anthropic.com) (the-decoder.com) Anthropic said the stronger agents got “objectively better outcomes,” and The Decoder reported Opus agents secured better prices and closed more deals on average than Haiku agents. (anthropic.com) (the-decoder.com) The fairness problem was that users with weaker agents did not detect the gap. Anthropic said post-experiment surveys showed weaker-model users did not notice their disadvantage. (anthropic.com) Outside coverage put numbers on that perception gap. Tech in Asia, citing Anthropic’s results, said fairness ratings were 4.05 for Opus users and 4.06 for Haiku users on a 1-to-7 scale. (techinasia.com) Anthropic also said participants were enthusiastic enough that they reported a willingness to pay for a similar service in the future. Several reports pegged that share at 46%. (anthropic.com) (gagadget.com) The company framed the test as a preview of agent-run commerce, where software acts on a person’s behalf instead of only answering questions. Anthropic’s April 9 policy paper said agents now write code, manage files, and complete tasks across apps with less human oversight. (anthropic.com) That policy paper also said lower oversight creates room for unintended actions and prompt-injection attacks that can trick agents into taking costly steps. Anthropic listed human control, transparency, security, and privacy among its five design principles. (anthropic.com) Project Deal lands as Anthropic is expanding fast beyond research demos. TechCrunch reported on April 7 that the company’s annualized revenue had reached $30 billion and that it had signed a larger compute agreement with Google and Broadcom. (techcrunch.com) The experiment’s closing fact is simple: the agents made real deals, the stronger ones won more often, and many of the people who lost could not tell. (anthropic.com) (the-decoder.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.