Deep Research Max scores 93.3% on benchmarks, up from 66.1%

- Google launched Deep Research and Deep Research Max on April 21, adding new Gemini API agents built for autonomous, long-running research across web and private data. - The headline number is 93.3% on Google’s DeepSearchQA benchmark, up from 66.1% in December, alongside charts, MCP connectors, and collaborative planning. - This pushes Google from “AI summary tool” toward enterprise research agent — faster for some jobs, but much more expensive and compute-heavy.

Google is trying to turn Gemini from a chatbot that helps with research into a research worker that can go do the job. That’s the real news here. On April 21, Google launched two new agents — Deep Research and Deep Research Max — and the Max version is the one meant to sit in the background, search for a long time, pull from outside systems, and come back with a cited report. The flashy stat is a benchmark jump from 66.1% to 93.3%, but the bigger shift is that Google is packaging “more thinking time” as a product. ### What is Deep Research Max? It’s Google’s higher-end autonomous research agent in the Gemini API. The standard Deep Research model is tuned for lower latency and lower cost. Max is tuned for comprehensiveness — basically the version you use when you want the system to keep digging, refine its plan, and synthesize a bigger answer instead of replying quickly. Google says both are built on Gemini 3.1 Pro, but Max gets extended test-time compute for harder investigations. ### Why does the 93.3% number matter? Because Google is using it to show that this is not just a small polish pass. The 93.3% score is on DeepSearchQA, Google’s benchmark for web research. The previous Deep Research release in December scored 66.1%. Google also says the newer system is its best on BrowseComp and state of the art on Humanity’s Last Exam, which you can’t wave it away as noise. ### What actually changed in the product? Three things. First, collaborative planning — you can review and steer the research plan before the agent runs. Second, native visualizations — charts and infographics can show up directly in the report. Third, outside data access — the agent can connect to MCP servers and File Search, so it can combine public web material with” it’s “search the web plus our own docs.” ### Why is MCP a big deal? Because MCP is the plumbing that lets an AI agent reach into other systems without every integration being custom-built from scratch. In plain English, it means Deep Research Max can pull from more than the open internet. It can work across internal documents, databases, and third-party sources if those systems expose an MCP server. That turns the agent from a clever browser into something closer to an analyst with access credentials. ### Is this a consumer feature or an enterprise tool? Mostly an enterprise and developer tool right now. Google shipped it through the Gemini API, and its own positioning is about finance, life sciences, market research, and other professional workflows. There is also consumer-facing Deep Research access in Google’s subscription stack, but the Max launch is really about developers building agentic workflows and companies paying for higher-end research automation. ### What’s the catch? The catch is cost and time. Google’s own split between the two models tells the story: use standard Deep Research when you want speed and efficiency; use Max when you want the highest-quality synthesis and are willing to spend more compute to get it. That makes Max powerful, but it also means it’s not the default choice for every prompt. Think of it less like “better Gemini” and more like hiring the senior analyst for the messy assignment. ### So what changed in the market? Google is joining the broader race to sell not just models, but agentic labor. The important move isn’t merely that Gemini got smarter. It’s that Google now has a clearer product ladder: quick answers, deeper research, and then Max for long-horizon, accuracy-critical work. If the system holds up outside Google’s own benchmarks, that’s the kind of product shift that pushes AI spending away from chat and toward workflow automation. ### Bottom line? Deep Research Max looks like Google’s strongest argument yet that “AI research agent” can be a real software category, not just a demo. The benchmark jump grabs attention. But the more durable story is the bundle — longer-running reasoning, private-data access, planning controls, and built-in visuals. That’s what makes this feel like a work product, not just a better answer box.

Deep Research Max scores 93.3% on benchmarks, up from 66.1%

Get your own daily briefing