Google Drops Faster, Cheaper Gemini Model

Google has launched a preview of Gemini 3.1 Flash-Lite, its 'fastest and most cost-efficient' AI model yet. Engineered for high-volume developer workloads and real-time apps, it boasts a 2.5x speed boost over previous models and a new pricing tier, making it a key tool for building low-latency coding assistants and agentic workflows.

Google's pricing for Gemini 3.1 Flash-Lite is set at $0.25 per 1 million input tokens and $1.50 per 1 million output tokens. This not only undercuts the previous Gemini 2.5 Flash but also positions it aggressively against competitors like Claude 4.5 Haiku, which can be more than three times as expensive for output tokens. The model's architecture is derived from the more powerful Gemini 3 Pro, rather than being a simplified version of Gemini Flash. This foundational choice allows it to maintain strong performance on reasoning and multimodal tasks, surpassing older, larger Gemini models on certain benchmarks. It was trained using Google's custom Tensor Processing Units (TPUs). A key innovation for developers is the introduction of adjustable "thinking levels". This feature allows for programmatic control over the model's reasoning depth, enabling users to balance performance and cost for specific tasks, from minimal for simple classifications to high for complex problem-solving. On the Chatbot Arena leaderboard, 3.1 Flash-Lite achieved an Elo score of 1432, placing it competitively with other models in its tier. It demonstrates strong performance on academic benchmarks, scoring 86.9% on GPQA Diamond and 76.8% on MMMU Pro for multimodal understanding. In terms of raw speed, the model outputs around 363 tokens per second, which is significantly faster than rivals like GPT-5 mini (71 tokens/s) and Claude 4.5 Haiku (108 tokens/s). This low latency is critical for its intended use cases, such as routing user queries, real-time translation, and content moderation pipelines. The model supports a 1 million token context window and can process text, image, audio, and video inputs. Early adopters like the fashion tech company Whering have reported achieving 100% consistency in item tagging for complex categories by integrating Flash-Lite into their classification pipeline. Currently in public preview, Gemini 3.1 Flash-Lite is accessible to developers through the Gemini API in Google AI Studio and to enterprise clients via Vertex AI.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.