Google's Gemini 3 Passes Advanced AI Benchmarks

Google's Gemini 3 Deep Think AI model has reportedly passed a series of difficult benchmarks, including one described as "Humanity's Last Exam." While details of the exam are sparse, the achievement is being presented as a demonstration of the rapid progress of generalist AI models.

- "Humanity's Last Exam" is a benchmark created by the Center for AI Safety and Scale AI, consisting of graduate-level questions designed to be too difficult for current AI, after previous benchmarks became saturated with models scoring over 90%. - Gemini 3's Deep Think mode set a new record on this exam with a score of 48.4% without external tools, significantly narrowing the gap with human experts who score around 90%. - The model also achieved a score of 84.6% on the ARC-AGI-2 benchmark, which tests the ability to learn new skills from novel visual puzzles; for context, humans average about 60% on this test. - In competitive programming, Gemini 3 attained an Elo of 3455 on the Codeforces benchmark, placing it in the "Legendary Grandmaster" tier, a level reached by a very small fraction of human programmers. - Beyond benchmarks, the upgraded "Deep Think" mode is a specialized reasoning engine designed to solve problems with incomplete data by using a more deliberate, "System 2" thinking process. - This advanced reasoning has been applied to scientific discovery, achieving gold-medal level results on benchmarks comparable to the 2025 International Olympiads in Math, Physics, and Chemistry. - For developers, this level of AI is positioned to eliminate repetitive scaffolding and "grunt coding," shifting the core job function from writing code to higher-level synthesis, problem selection, and system orchestration. - The new Deep Think mode is now accessible to Google AI Ultra subscribers within the Gemini app and is being offered to researchers and enterprise users through the Gemini API via an early access program.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.