Google's Gemini Flash winning customers
- Google’s Gemini Flash models overtook Anthropic in token demand on Vercel’s AI Gateway in early April 2026, according to Vercel CEO Guillermo Rauch. - Vercel’s data showed Anthropic led in March, but Gemini 3 Flash moved ahead in early April and stayed there, Business Insider reported. - Google I/O begins next week, where Google is expected to unveil additional AI models, tools and features.
Google’s Gemini Flash models have moved ahead of Anthropic on Vercel’s AI Gateway, according to Vercel CEO Guillermo Rauch, giving Google a visible gain in one slice of the market for application developers building AI products. Business Insider reported on May 15 that Rauch said Gemini 3 Flash overtook Anthropic in token traffic on Vercel’s gateway in early April after Anthropic led in March. Vercel’s AI Gateway is a single endpoint that lets customers route requests across multiple model providers, monitor usage and set budgets, according to Vercel’s documentation. The shift matters because Vercel sits close to the developers choosing which models power chatbots, coding tools, search features and copilots. Rauch told Business Insider he saw enough demand that he called a senior Google executive to ask for more Gemini tokens, the report said. Google did not need to win the broadest benchmark argument for that to happen; it needed customers to pick a model and ship it. (businessinsider.com) ### Why does Vercel’s gateway matter in this race? Vercel says AI Gateway gives developers access to hundreds of models through one API key, with built-in spending controls, monitoring, load balancing and fallbacks. That setup makes the service a useful vantage point on which providers developers are actually routing traffic to once products are in production. Vercel also says token pricing on the gateway carries no markup versus the underlying provider. (businessinsider.com) The gateway is tied closely to Vercel’s AI tooling. Vercel’s AI SDK documentation and GitHub repository say developers can access providers including OpenAI, Anthropic, Google, Meta and xAI through a shared interface, lowering the switching cost when teams want to compare or replace models. ### What is Gemini Flash selling to developers? (vercel.com) Google describes Gemini 2.5 Flash as its best model on price-performance and says it is aimed at large-scale processing, low-latency, high-volume tasks and agentic use cases. Google’s developer documentation lists a 1,048,576-token input limit and support for text, images, video and audio inputs, along with features such as function calling, structured outputs, caching and search grounding. (ai-sdk.dev) Google’s pricing pages show why that pitch may resonate with product teams watching inference costs. Gemini 3.1 Flash-Lite is priced at $0.25 per million input tokens and $1.50 per million output tokens, while Google’s Gemini 3.1 Pro Preview is priced at $2 for input and $12 for output up to 200,000 tokens. Google also advertises batch processing discounts and context caching on paid tiers. (ai.google.dev) ### How does that compare with Anthropic’s current offer? Anthropic said on February 17 that Claude Sonnet 4.6 kept the same API pricing as Sonnet 4.5, starting at $3 per million input tokens and $15 per million output tokens. Anthropic positioned Sonnet 4.6 as a stronger coding and computer-use model, with a 1 million-token context window in beta. (ai.google.dev) That means Google and Anthropic are not pitching exactly the same thing. Anthropic has emphasized capability gains in Sonnet, while Google has continued to market Flash around throughput, latency and cost efficiency. Vercel’s traffic data, as described by Rauch, suggests a meaningful share of developers on its platform are opting for the cheaper, faster class of model for production workloads. (anthropic.com) ### Does this show revenue moving from Anthropic to Google? Business Insider’s report described the Vercel chart in terms of token traffic, not revenue, and that distinction matters. A lower-priced model can win more tokens while generating less revenue than a pricier rival. BizTech Weekly, citing Rauch’s presentation, reported that Anthropic still led in revenue share even after Google moved ahead in raw token usage, though Reuters could not independently verify that figure from Vercel materials reviewed here. (anthropic.com) Vercel’s own documentation supports the idea that developers can route workloads by cost and reliability because the gateway includes budgets, monitoring and fallbacks. In practice, that means one provider can win high-volume traffic without locking in every workload. ### What should readers watch next? Google I/O is scheduled to begin next week, and Business Insider said Google is expected to introduce more AI models, tools and features at the conference. (biztechweekly.com) Any new Flash-tier releases or pricing changes would give developers another reason to revisit model choices already visible in Vercel’s gateway data. (businessinsider.com) (vercel.com)