Microsoft Fara1.5 tops web benchmarks

- Microsoft Research said on May 21 that its open-weight Fara1.5 browser-agent models posted higher live-web benchmark scores than OpenAI and Google. - The headline figure was 72% on Online-Mind2Web for Fara1.5-27B, versus 58.3% for OpenAI Operator and 57.3% for Google Gemini 2.5. - Microsoft’s research page lists the 9B model on Microsoft Foundry, with 4B and 27B versions expected to follow.

Microsoft Research said on May 21 that its new Fara1.5 family of browser “computer use agent” models outperformed OpenAI and Google on a live-web benchmark that has become a closely watched test of AI agents. The company said its largest open-weight model, Fara1.5-27B, scored 72% on Online-Mind2Web, a benchmark built around 300 tasks across 136 live websites. Crypto Briefing, which reported the results on May 22, said the figures put Fara1.5 ahead of OpenAI’s Operator at 58.3% and Google’s Gemini 2.5 Computer Use at 57.3%. Microsoft said the models are being released in 4 billion, 9 billion and 27 billion parameter sizes. ### What exactly did Microsoft release? Microsoft Research said Fara1.5 is a family of browser agents designed to complete tasks such as comparing products, filling out forms and booking events. The release includes Fara1.5-4B, Fara1.5-9B and Fara1.5-27B, according to the company’s research post. The company described the models as “computer use agents,” meaning they operate through a browser interface rather than only generating text. (microsoft.com) Microsoft said the models are trained with MagenticLite, a sandboxed browser setup, and are built to ask for user approval before critical actions such as purchases or account changes. ### Why are these benchmark numbers getting attention? Online-Mind2Web is a live-web benchmark with 300 tasks across 136 websites, according to the benchmark’s GitHub page and Princeton-hosted leaderboard. That matters because the test is meant to measure whether an agent can complete tasks on changing websites rather than static snapshots. Microsoft said Fara1.5-27B scored 72% on that benchmark, while Fara1.5-9B reached 63% and Fara1.5-4B reached 57%. (microsoft.com) Crypto Briefing reported that those figures placed the 27B and 9B variants ahead of OpenAI’s Operator and Google’s Gemini 2.5 Computer Use on the same test. ### How big is the claimed jump from Microsoft’s earlier model? (github.com) Microsoft said Fara1.5 improves on its earlier Fara-7B model, which the company had previously benchmarked on the same task set. Crypto Briefing reported that Fara-7B scored 34.1% on Online-Mind2Web, compared with 72% for Fara1.5-27B and 63.4% for Fara1.5-9B. (microsoft.com) The company’s own post said the 9B model “nearly doubles” the performance of Fara-7B. That comparison is one reason the release drew notice beyond the narrower contest with OpenAI and Google. ### How much weight should readers put on benchmark claims? Crypto Briefing said benchmark claims in AI should be treated cautiously, a caveat that reflects a broader issue with web-agent testing rather than a dispute over the published numbers themselves. (microsoft.com) Online-Mind2Web’s maintainers say tasks are updated when websites change or become invalid, which means results depend in part on evaluation timing and setup. The Princeton-hosted Online-Mind2Web page describes the benchmark as an attempt to test agents in conditions closer to real-world web use. A 2025 paper introducing the benchmark said it was built to address what the authors called over-optimism in earlier web-agent evaluations. ### Where does this leave OpenAI and Google? OpenAI introduced Operator in January 2025 as a research preview of an agent that can use its own browser, and later folded those capabilities into ChatGPT agent mode, according to OpenAI. (cryptobriefing.com) Google’s Gemini 2.5 Computer Use is one of the systems Microsoft and outside reports used as a comparison point in browser-task testing. (hal.cs.princeton.edu) The Microsoft release does not settle the broader competition in browser agents, because vendors often compare systems across different scaffolds, prompts and evaluation conditions. But the published results add a new open-weight entrant to a field that has been led largely by proprietary products from OpenAI and Google. (openai.com) ### What happens next with Fara1.5? Microsoft said the 9B version is available on Microsoft Foundry, while the 4B and 27B models are expected to follow. The company’s research page also said the models were designed to run on relatively modest hardware, which could make them easier for outside developers to test and adapt. (microsoft.com) May 21 is the key date for the release, and the next concrete milestone is broader availability of the remaining model sizes through Microsoft’s distribution channels. OpenAI, Google and independent benchmark maintainers are also likely to keep updating comparative results as live-web tasks and agent systems change. (microsoft.com) (cryptobriefing.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.