Google’s Gemini 3.1 Flash Live
Google rolled out Gemini 3.1 Flash Live — a realtime multimodal audio model with lower latency, better tone understanding, improved function calling and roughly 2× conversation memory, and it’s being pushed across multilingual Search Live in 200+ countries. The updates double down on agentic, always‑on multimodal experiences. (x.com) (x.com)
Google published the Gemini 3.1 Flash Live announcement on March 26, 2026 and said the model is available in preview to developers via the Gemini Live API in Google AI Studio, offered to enterprises through Gemini Enterprise for Customer Experience, and surfaced to consumers through Gemini Live and the expanded Search Live experience. (blog.google) Google reported the model’s benchmark results as 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI’s Audio MultiChallenge with “thinking” enabled, figures Google presented to demonstrate the model’s multi-step function-calling and real-world audio robustness. (blog.google) Search Live — the voice+camera conversational mode in the Google app — was rolled out to more than 200 countries and territories and is available in all languages and locations where Google’s AI Mode is offered, with the Live button accessible from the Google app and a new Live tab in Google Lens. (blog.google) The Gemini API developer documentation lists concrete limits for the Flash Live preview: an input token limit of 131,072 and an output token limit of 65,536, and it specifies that function calling is supported synchronously while async function calling is not yet supported. (ai.google.dev) Google’s blog states that all audio generated by Gemini 3.1 Flash Live is watermarked to help identify AI outputs, and DeepMind’s SynthID tooling can be used to detect those watermarks across media types. (blog.google) The Live API documentation also notes a changes-in-behavior detail: Gemini 3.1 replaces the prior thinkingBudget setting with a thinkingLevel configuration (defaulting to minimal for lower latency), and a single BidiGenerateContentServerContent event can now carry multiple simultaneous content parts such as audio chunks and transcripts. (ai.google.dev)