Google claims Gemini 3.5 Flash speed

- Google said on May 19 that Gemini 3.5 Flash is its strongest coding and agent model yet, launched at I/O 2026. - Google said Gemini 3.5 Flash runs four times faster than other frontier models and can complete some tasks at less than half cost. - Gemini 3.5 Flash is available now in Gemini apps and APIs, with broader internal rollout continuing next month.

Google said at its I/O conference on May 19 that Gemini 3.5 Flash is designed to close the gap between fast, lower-cost AI models and more expensive flagship systems used for coding and agent tasks. The company said the new model is its strongest agentic and coding model yet and is available through the Gemini app, Google Search’s AI Mode, Google AI Studio, Android Studio and the Gemini API. Google also said it is already using the model internally and plans a broader rollout next month. The company’s pitch rested on two claims. Google said Gemini 3.5 Flash outperforms Gemini 3.1 Pro on several coding and agent benchmarks, while running four times faster than other frontier models when measured by output tokens per second. It also said some long-horizon tasks that once took days or weeks can now be completed in a fraction of the time, often at less than half the cost of other frontier models. (blog.google) ### What exactly did Google claim at I/O? Google DeepMind said Gemini 3.5 Flash “delivers intelligence that rivals large flagship models” while keeping the lower-latency profile associated with its Flash line. In its launch post, Google cited benchmark results including 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, 1656 Elo on GDPval-AA and 84.2% on CharXiv Reasoning. (blog.google) Koray Kavukcuoglu, Google DeepMind’s CTO, and Jeff Dean, Google DeepMind’s chief scientist, said the model is built for “complex, agentic workflows” and positioned it as the first release in the Gemini 3.5 family. Google said Gemini 3.5 Pro is also in development. ### Where does the speed-and-cost claim come from? (blog.google) Google’s public comparison is partly about model behavior and partly about pricing. The company’s developer pricing page lists Gemini 3.5 Flash at $1.50 per 1 million input tokens and $9.00 per 1 million output tokens on the paid tier, with a free tier also available for smaller projects. Google also offers a Batch API option that it says cuts some costs by 50%. (blog.google) Google did not name specific rival models in the launch post line that says Gemini 3.5 Flash is four times faster than “other frontier models,” so the comparison depends on Google’s own benchmark framing. The company tied that speed claim to output tokens per second and said the model lands in the top-right quadrant of the Artificial Analysis index, which it described as combining frontier-level intelligence with high speed. (ai.google.dev) ### Why does this matter for developers using coding agents? Google’s developer post paired Gemini 3.5 Flash with Antigravity 2.0, a desktop app, CLI and SDK for orchestrating multiple agents and running parallel tasks. Google said the updated harness is co-optimized for Gemini models and supports persistent environments, scheduled tasks and dynamic subagents for multi-step workflows. (blog.google) That means Google is not presenting Gemini 3.5 Flash as only a chatbot upgrade. The company is pitching it as an engine for software work that spans planning, code generation, iteration and tool use across longer sessions. That is an inference from Google’s product packaging and launch language, not an independent benchmark result. (blog.google) ### If generation gets cheaper, where does the bottleneck move? A faster coding model does not remove the need for review. Google’s own materials emphasize supervised execution, multi-step workflows and persistent environments, which implies more generated work moving through normal engineering processes. (blog.google) In practice, that shifts pressure onto reproducibility, code review and ownership. If teams use AI systems to write migrations, tests or application logic, the harder question becomes who verified the output, what tool path produced it and how another team can reproduce or roll back the result. That is an inference based on the workflow Google described, rather than a claim Google explicitly made. (blog.google) ### Where can developers use it now? Google said Gemini 3.5 Flash is available now through the Gemini app, AI Mode in Google Search, Google AI Studio, Android Studio, Google Antigravity, the Gemini API and enterprise channels including Gemini Enterprise Agent Platform. The model card says it supports text, image, audio and video inputs, with a context window of up to 1 million tokens and outputs of up to 64,000 tokens. (blog.google) Google’s next step is a broader internal rollout next month, according to the company’s May 19 launch post, while work on Gemini 3.5 Pro continues. (blog.google)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.