MolmoAct 2 speeds lab robotics

- Ai2 released MolmoAct 2 on May 5, pairing an open robot-control model with code, weights, and datasets aimed at real-world manipulation. - The headline numbers are up to 37x faster inference and a new 720-hour Bimanual YAM dataset for two-arm tabletop training. - It matters because open robot models have mostly lagged closed rivals or pricey hardware stacks; this one tries to narrow both gaps.

Robotics models are getting better at flashy demos. The harder part is getting a machine to do boring physical work over and over without falling apart. That is the gap Ai2 is trying to close with MolmoAct 2, released on May 5. The pitch is simple — an open robot-control model that reasons in 3D, runs much faster than the first MolmoAct, and ships with the code, weights, and datasets needed for other labs to actually use it. (allenai.org) ### What is MolmoAct 2, exactly? MolmoAct 2 is a vision-language-action model — basically, a system that looks at camera input, reads an instruction, and turns that into robot motions. Ai2 built it on top of Molmo2-ER, a version of its Molmo model tuned for spatial and embodied reasoning, then added an action expert for closed-loop manipulation. The important par(allenai.org)t it is meant to control real robot arms rather than just describe what is in view. (arxiv.org) ### Why is speed such a big deal? Because robot control is not like chatbot use. A robot cannot stop to think for seconds every time the scene changes. The paper’s big claim is up to 37x faster performance than the original MolmoAct, and that speedup comes from changing how the model handles 3D reasoning. Instead of recomputing everything every step, the MolmoAct2(arxiv.org) the scene that changed. Same basic idea of grounded reasoning — much less latency. (allenai.org) ### What changed under the hood? Three things look load-bearing. First, the new Molmo2-ER backbone was trained on a 3.3 million-sample embodied-reasoning corpus. Second, Ai2 added an open action tokenizer called OpenFAST, trained across millions of trajectories on five robot embodiments. Third, the action stack now links the vision-language model to a flow-matchi(allenai.org)esign. That sounds wonky, but the practical meaning is cleaner motion output without giving up the model’s spatial reasoning. (arxiv.org) ### Why does the bimanual dataset matter? Because two-arm manipulation is where lab and factory work gets more realistic. Ai2 released MolmoAct2-Bimanual YAM alongside the model, with 720 hours of teleoperated bimanual trajectories, and describes it as the largest open bimanual tabletop dataset so far. That matters for tasks where one arm stabilizes while the othe(arxiv.org)y and sample-prep work. (allenai.org) ### Is this really for labs? Partly, yes — but not in the “drop it into any biology lab tomorrow” sense. Ai2 frames the target as real-world manipulation across places like kitchens, offices, labs, and factories. The model repository also makes clear these are foundation checkpoints and benchmark-specific fine-tunes, not universal deployment policies. So the rele(allenai.org)eams than a finished lab robot employee. (allenai.org) ### How strong are the results? Ai2 says MolmoAct 2 beats strong baselines across seven simulation and real-world benchmarks, including outperforming Physical Intelligence’s π0.5 in its study, while the Molmo2-ER backbone beats GPT-5 and Gemini Robotics ER-1.5 on 13 embodied-reasoning benchmarks. Those are serious claims. But they are still benchmark claims, whic(allenai.org) stack” from “ready for unattended production.” (arxiv.org) ### Why does open source change the story? Because open robotics has usually been open in a thin way — maybe weights, maybe a paper, but not enough to reproduce the system. Here Ai2 is releasing model weights, training code, and complete training data, plus benchmark- and robot-specific checkpoints on GitHub and Hugging Face. That lowers the barrier for universiti(arxiv.org)ant to adapt a policy instead of starting from scratch. (arxiv.org) ### Bottom line? MolmoAct 2 is not the moment robots quietly took over lab benches. But it does look like a real shift in the tooling. The interesting part is not just a faster model — it is a faster open model with enough released scaffolding that other people can test, tune, and deploy it on their own hardware. If that holds up, lab automation gets less bottlene(arxiv.org)k. (allenai.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.