Microsoft's Fara1.5 tops web benchmarks

- Microsoft's open-weight Fara1.5 model reportedly outperformed OpenAI's Operator and Google's Gemini 2.5 on live web-task benchmarks in head-to-head tests, the report said. (cryptobriefing.com) - The comparisons focused on live web tasks—research, retrieval and task completion—where Fara1.5 scored higher than Operator and Gemini 2.5 in those measures. (cryptobriefing.com) - Observers noted this underscores that model architecture and open weights still matter even as distribution and regulation are becoming primary battlegrounds in the AI race. (indianexpress.com)

Microsoft Research said on May 21 that its new Fara1.5 browser-agent models beat prior open models on live web benchmarks and, in one cited comparison, topped OpenAI and Google systems. (microsoft.com) The key number in the report is 72%: that was the score cited for Fara1.5-27B on Online-Mind2Web, ahead of OpenAI’s Operator at 58.3% and Google’s Gemini 2.5 at 57.3%, according to Crypto Briefing’s summary of the results. (microsoft.com) That matters because these are not chatbot trivia tests. They are browser-control tasks: clicking, typing, scrolling, retrieving information, filling forms and completing multi-step actions on live sites. Microsoft said Fara1.5 is built for “computer use agent” work in the browser, and described examples such as comparing products, filling out forms and booking events. OpenAI describes Operator in similar terms, while Google’s Computer Use documentation says its model is meant to automate browser tasks using screenshots and generated UI actions. (microsoft.com) A useful way to read this result is that Microsoft is competing on agent execution, not just on model size or chatbot polish. Fara1.5 comes in 4B, 9B and 27B versions, and Microsoft said the family is designed to remain practical to deploy on modest hardware while improving on its earlier Fara-7B line. On Microsoft’s own published benchmark page, the 9B model scored 63% on Online-Mind2Web, versus 57% for the 4B model and 72% for the 27B version. (microsoft.com) The open-weight part is also central. Microsoft published Fara through its research channels and GitHub repository, where it noted a coming “Fara1.5 agent harness” update and a refreshed WebTailBench benchmark suite. That makes this story different from a closed product launch: developers can inspect the work more directly, run evaluations and build around the models rather than only consume them through a hosted interface. (github.com) The comparison is still narrower than “best AI overall.” Microsoft’s page emphasizes benchmark performance within computer-use tasks, and Google and OpenAI describe their own systems as preview or research-stage tools with safety checks and user handoffs for sensitive actions. Google says its Computer Use capability may be prone to errors and security vulnerabilities in preview, and OpenAI says Operator was launched as a research preview before being folded into ChatGPT’s agent mode. (openai.com) So the clean takeaway is this: on the specific problem of using a browser like a human, Microsoft has evidence that its latest open-weight agent is highly competitive and, by the cited benchmark, ahead. The next thing to watch is whether independent developers reproduce those scores on WebTailBench and Online-Mind2Web, and whether OpenAI or Google publish newer head-to-head results for their current agent systems. (github.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.