Models guess instead of asking

A study named ProactiveBench found that 22 multimodal models rarely asked users for missing visual information and instead tended to guess or hallucinate. The researchers report that a simple reinforcement-learning approach can improve the models' tendency to request clarification. (the-decoder.com)

ProactiveBench found 22 multimodal models usually guessed answers when images lacked needed detail instead of asking users for clarification. (arxiv.org) Multimodal models are systems that take text and images together and answer image-based questions; ProactiveBench tests whether they will request extra visual input when a scene is ambiguous. (cloud.google.com) The benchmark repurposes seven existing datasets into unsolvable-at-first scenarios—occluded objects, noisy images, coarse sketches, temporal ambiguities, and requests for a new camera view. (arxiv.org) ProactiveBench’s public materials say the full suite contains more than 108,000 images arranged into about 18,000 test samples, and the authors removed tasks a model could already solve on first try to make the test stricter. (developmentstoday.com) When the authors ran 22 models, aggregate performance fell sharply on the proactive tasks; the paper reports that models rarely produced proactive suggestions and instead guessed, refused, or hallucinated. (arxiv.org) The team also found proactiveness did not track with model size, and that conversation histories and in‑context learning sometimes introduced negative biases that reduced asking behavior. (arxiv.org) As a proof of concept the researchers used a simple reinforcement‑learning fine‑tuning step to reward asking for needed information, and the tuned models showed higher rates of proactive requests and some generalization to unseen scenarios. (arxiv.org) OpenReview discussion and the project’s release notes show the authors framed ProactiveBench as a first step and included evaluation scripts and examples to help other teams test proactiveness. (openreview.net) ProactiveBench’s results matter for image‑centric assistants—tools used in medical triage, robotics, or remote inspection could be safer if models asked for clearer views instead of inventing details. (developmentstoday.com) The authors say they will publicly release ProactiveBench and accompanying code so developers can measure and train for proactiveness; the reinforcement‑learning approach offers a concrete path to reduce guessing. (arxiv.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.