MolmoWeb beats GPT-4o on multimodal benchmarks

MolmoWeb, an open multimodal web agent, claims SOTA performance and reportedly outperformed GPT-4o on benchmarks—showing browser automation and multimodal agents are advancing fast. That opens room for student projects that experiment with web-driven agents and automation. (x.com)

Ai2 published MolmoWeb on March 24, 2026, and released model weights, the MolmoWebMix dataset, inference code, and evaluation tools alongside a tech report. (allenai.org) MolmoWeb ships as Molmo 2 family variants in 4B and 8B parameter sizes and operates by observing screenshots instead of parsing HTML or accessibility trees. (allenai.org) The MolmoWebMix training mixture combines over 100K+ synthetic trajectories with 30K human demonstrations and accompanying GUI perception data used to train and evaluate the agents. (allenai.org) On live web-navigation benchmarks the MolmoWeb 8B reported 78.2% on WebVoyager, 42.3% on DeepShop, and 49.5% on WebTailBench, placing it as the top open-weight web agent across the evaluated tests. (aihola.com) Test-time scaling (multiple rollouts) pushed WebVoyager pass rates substantially higher—Byteiota reported a pass@4 jump to 94.7% for the 8B model during evaluation. (byteiota.com) Training used supervised fine-tuning (no reinforcement learning and no distillation from proprietary vision systems) on a cluster setup reported as 64 NVIDIA H100 GPUs, and the stack pairs Molmo language models with SigLIP2 vision encoders under the Molmo2 architecture. (the-decoder.com) The public GitHub repo includes an Apache‑2.0 LICENSE, scripts to download checkpoints (MolmoWeb-8B and MolmoWeb-4B are available on Hugging Face), and a reproducible inference server and evaluation harness. (github.com) The release notes and demos indicate Playwright is used for browser control and that a hosted demo is available but limited to whitelisted sites, while the repository documents backends (FastAPI/Modal) and example deployment scripts for local or cloud self‑hosting. (aihola.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.