Databricks' $100K reasoning cup
Databricks launched the Grounded Reasoning Cup offering a $100K prize for PhD/Master’s students to tackle enterprise reasoning challenges with frontier models, explicitly linking contestants with researchers and recruiters. The competition is positioned as a fast path for academic talent to be noticed by industry labs. (x.com) (x.com)
OfficeQA Pro’s test corpus spans nearly a century of U.S. Treasury Bulletins totaling about 89,000 pages and over 26 million numerical values, with OfficeQA Pro containing 133 questions and OfficeQA Full containing 246 questions. (arxiv.org) Databricks’ published evaluations show frontier models score under 5% on OfficeQA Pro when relying on parametric knowledge, under 12% with web access, and average around 34.1% when agents are given direct access to the document corpus. (arxiv.org) The Databricks team reports that providing a structured parser (ai_parse_document) produced roughly a 16.1% average relative performance gain across evaluated agents, highlighting the measurable impact of preprocessing and retrieval tooling on grounded-reasoning tasks. (arxiv.org) Databricks announced a timed competition, the Grounded Reasoning Cup, to run in Spring 2026 where submitted AI agents will be evaluated on OfficeQA and will compete alongside human teams in head-to-head benchmark rounds. (databricks.com) Databricks has run large community competitions before (the Generative AI World Cup drew about 1,500 data scientists and engineers from 18 countries) and maintains an active university recruiting program and hundreds of open technical roles across research and engineering. (databricks.com) The OfficeQA GitHub release includes 696 Treasury Bulletin PDFs (~20GB) plus parsed and transformed versions, code under Apache 2.0, and dataset artifacts under CC-BY-SA 4.0, indicating the competition will require participants to handle large-scale document parsing, retrieval, and evaluation pipelines. (github.com) The OfficeQA Pro paper lists Databricks AI Research contributors including Matei Zaharia and other senior researchers, signaling direct involvement from the company’s research leadership in benchmark design and evaluation. (arxiv.org)