Microsoft debuts 'Critique'
Satya Nadella announced 'Critique', a multi-model deep research system integrated into Microsoft 365 Copilot that combines multiple models to optimize answers and reporting. The move formalizes multi-model orchestration inside a major productivity product. (x.com)
Microsoft also introduced a feature called Council that surfaces side‑by‑side responses from different models so users can compare alternative syntheses and citations directly in the Researcher experience. (publicnow.com) Critique’s architecture explicitly separates generation from evaluation: one model handles planning, retrieval, and initial drafting while a second model functions as an expert reviewer to validate claims and refine presentation, and Microsoft said the system can combine models from Frontier labs including Anthropic and OpenAI. (publicnow.com) In Microsoft’s internal DRACO evaluations, Researcher with Critique produced a +7.0 point improvement (SEM ±1.90) on the aggregated score, a +13.88% gain over the Perplexity Deep Research baseline (Claude Opus 4.6) reported in the benchmark. (microsoft.com) The DRACO benchmark (DRACO: a Cross‑Domain Benchmark for Deep Research Accuracy, Completeness, and Objectivity) is documented on arXiv as arXiv:2602.11685 and contains 100 complex research tasks sampled from real Perplexity Deep Research queries across 10 domains and sources in 40 countries. (arxiv.org) Microsoft is exposing Critique and the upgraded Researcher workflows through its Frontier early‑access program for Microsoft 365 Copilot customers, positioning these features for tester feedback before broader general availability. (adoption.microsoft.com) Critique will be the default Researcher experience when users select “Auto” in the model picker, and every Researcher report now includes a cover letter that pinpoints agreement, divergence, and model‑specific insights across the candidate responses. (publicnow.com)