Microsoft's Fara1.5 outperforms rivals
- Microsoft Research said on May 21 that its open-weight browser agent Fara1.5 posted stronger live-web benchmark results than OpenAI and Google rivals. - Microsoft’s top Fara1.5 model scored 72% on Online-Mind2Web tasks, ahead of OpenAI Operator at 58.3% and Google Gemini 2.5 at 57.3%. - Google’s next public marker is continued Gemini rollout after Sundar Pichai said May 19 monthly users had passed 900 million.
Microsoft Research said on May 21 that its Fara1.5 family of browser agents beat OpenAI and Google on a live-web benchmark built to test whether AI systems can actually complete tasks online, not just answer questions. The company said its largest model, Fara1.5-27B, reached a 72% task success rate on Online-Mind2Web, a benchmark covering 300 tasks across 136 websites. Microsoft’s own write-up said the models are designed for “computer use” in the browser, including comparing products, filling out forms and booking events. The release lands as Google says Gemini has crossed 900 million monthly users, showing that distribution and benchmark leadership are not the same contest. ### What exactly did Microsoft release? Microsoft Research said Fara1.5 is a family of three open-weight browser agent models: 4B, 9B and 27B parameters. The company said the models were trained for browser-based “computer use agent” work, meaning they can navigate websites and execute multi-step tasks rather than only generate text. (microsoft.com) The May 21 Microsoft post said the 9B model scored 63% on Online-Mind2Web, while the 4B model scored 57% and the 27B model scored 72%. Microsoft also said the models were built to run on more modest hardware than larger proprietary systems, and that they ask for approval or clarification when needed. ### How did Fara1.5 compare with OpenAI and Google? (microsoft.com) Microsoft’s benchmark comparison, as reported by Decrypt and Crypto Briefing, put Fara1.5-27B at 72% on live-web tasks, ahead of OpenAI’s Operator at 58.3% and Google’s Gemini 2.5 Computer Use at 57.3%. Those figures matter because they refer to web actions on real sites, where agents have to handle changing page layouts, clicks and forms. (microsoft.com) Online-Mind2Web is narrower than broad consumer use, but it is one of the cleaner ways to compare browser agents on the same task set. Microsoft’s own description of the benchmark says it spans 300 tasks across 136 popular sites. ### If Microsoft won the benchmark, why is Google talking about 900 million users? Google CEO Sundar Pichai said on May 19 at Google I/O that the Gemini app had surpassed 900 million monthly active users, up from 400 million at the previous year’s event. (decrypt.co) Pichai said the increase came as Google embedded Gemini across Search and other core products. Google’s figures point to reach, while Microsoft’s Fara1.5 results point to task performance in a specific class of agent systems. (microsoft.com) Those are different measures: one is user adoption inside a large product ecosystem, the other is benchmarked execution on browser tasks. That distinction is visible in Google’s own framing at I/O, where Pichai described the company as entering an “agentic Gemini era.” (blog.google) ### Why are browser agents getting more attention now? Microsoft said Fara1.5 is meant for actions such as cross-site comparison shopping, form filling and booking tasks. That puts it in a category of systems designed to do work on the web, not just summarize or chat. Google used similar language this week. Pichai’s I/O keynote described an “agentic Gemini era,” suggesting that the largest AI companies are now competing over systems that can carry out steps across products and websites. (blog.google) ### What are the reliability questions hanging over that shift? Google Search users on May 22 found that searches for words such as “disregard,” “ignore” and “dismiss” could trigger odd AI-style responses in AI Overviews, according to reports aggregated by MSN from The Verge. (microsoft.com) The episode was brief, but it showed how brittle search-linked AI behavior can look when a system appears to treat a query word like an instruction. (blog.google) Microsoft, for its part, said some important tasks cannot be safely trained on the live web because they require logins or irreversible actions such as sending an email. The company said it supplemented live-web data with synthetic domains to simulate those environments. May 19 and May 21 provided the two clearest markers in this race: Google’s I/O user update and Microsoft’s Fara1.5 release. (msn.com) The next public test will come as both companies push more agent features into products and publish additional benchmark or usage data. (blog.google) (microsoft.com)