New CV Benchmark for Multimodal Brain Tumor Diagnosis
A new paper on arXiv, "MM-NeuroOnco," introduces a multimodal benchmark for diagnosing brain tumors from MRI scans. The dataset aims to advance research in medical computer vision by providing a standardized way to evaluate models that integrate different types of imaging data. This reflects a growing trend in applying complex vision models to specialized medical domains.
The MM-NeuroOnco dataset is a substantial new resource, comprising 24,726 MRI slices drawn from 20 different data sources. It's paired with nearly 200,000 multimodal instructions designed to train models for complex diagnostic reasoning, not just lesion detection. This moves beyond traditional datasets that often lack rich semantic annotations for clinical interpretation. A key challenge in creating such a dataset is the high cost and scarcity of expert annotations. The creators addressed this by developing a multi-model collaborative pipeline that automates the generation of diagnostic information and performs quality control, enabling the creation of annotations at scale. To properly evaluate models, the researchers also built MM-NeuroOnco-Bench, a manually annotated test set. This benchmark incorporates a "rejection-aware" setting, a sophisticated method designed to mitigate the biases that can arise from simple multiple-choice or yes/no question formats. The difficulty of this new benchmark is highlighted by the performance of existing models. A powerful baseline, Gemini 3 Flash, only achieved 41.88% accuracy on the diagnosis-related questions, underscoring the challenge of genuine clinical reasoning for AI. The paper also introduces a new model, NeuroOnco-GPT, specifically fine-tuned on the MM-NeuroOnco dataset. This model saw a 27% absolute improvement in accuracy on diagnostic questions, demonstrating the dataset's effectiveness in advancing multimodal AI for medicine. This benchmark is part of a broader shift in oncology towards multimodal AI, which integrates diverse data types like imaging, clinical records, and genomics. In MRI specifically, different sequences like T1-weighted, T2-weighted, and FLAIR provide complementary information on tumor structure, edema, and other characteristics; fusing them gives a more complete picture.