Google launches Gemini 3.1 Flash Live

Published by The Daily Scout

What happened

Google shipped Gemini 3.1 Flash Live — a real‑time speech-to-speech model tuned for live, multi-step agent workflows and scoring highly on complex audio benchmarks — signalling a push toward voice-first agent orchestration. Platform teams will need to re-test session observability and integration points because the model changes output structure and error behavior compared with prior Gemini releases. (manilashaker.com) (blog.laozhang.ai)

Why it matters

Migration requires swapping the model string from gemini-2.5-flash-native-audio-preview-12-2025 to gemini-3.1-flash-live-preview and replacing the old thinkingBudget knob with thinkingLevel (options: minimal, low, medium, high), with the default set to minimal to prioritize latency. (ai.google.dev) A single BidiGenerateContentServerContent event can now contain multiple content parts simultaneously (for example, interleaved audio chunks plus transcript parts), so integrations must parse every part in each event to avoid dropped media or transcripts. (ai.google.dev) The Live API uses stateful WebSocket sessions; Google’s docs recommend server-side session context persistence, reconnection with exponential backoff, and sensible timeouts because the session remembers conversational context across the stream. (docs.cloud.google.com) Tool/function calls and multimodal outputs travel over the same low-latency bidi stream, meaning platform dispatchers should handle asynchronous function-call requests and simultaneous audio+visual parts rather than sequential, turn-based tooling. (docs.cloud.google.com) Google published the gemini-3.1-flash-live-preview entry and changelog on March 26, 2026, and announced that the model is available to developers via the Gemini Live API in Google AI Studio while powering Gemini Live and Search Live worldwide. (ai.google.dev) Google is applying SynthID provenance watermarking to generated media (including audio) for detection of AI-created content, so enterprise audio pipelines and compliance filters must treat generated audio as watermarked content and surface verification metadata where required. (support.google.com)

Key numbers

  • Google shipped Gemini 3.1 Flash Live — a real‑time speech-to-speech model tuned for live, multi-step agent workflows and scoring highly on complex audio benchmarks — signalling a push toward voice-first agent orchestration.

What happens next

  • Platform teams will need to re-test session observability and integration points because the model changes output structure and error behavior compared with prior Gemini releases.

Quick answers

What happened in Google launches Gemini 3.1 Flash Live?

Google shipped Gemini 3.1 Flash Live — a real‑time speech-to-speech model tuned for live, multi-step agent workflows and scoring highly on complex audio benchmarks — signalling a push toward voice-first agent orchestration. Platform teams will need to re-test session observability and integration points because the model changes output structure and error behavior compared with prior Gemini releases. (manilashaker.com) (blog.laozhang.ai)

Why does Google launches Gemini 3.1 Flash Live matter?

Migration requires swapping the model string from gemini-2.5-flash-native-audio-preview-12-2025 to gemini-3.1-flash-live-preview and replacing the old thinkingBudget knob with thinkingLevel (options: minimal, low, medium, high), with the default set to minimal to prioritize latency. (ai.google.dev) A single BidiGenerateContentServerContent event can now contain multiple content parts simultaneously (for example, interleaved audio chunks plus transcript parts), so integrations must parse every part in each event to avoid dropped media or transcripts. (ai.google.dev) The Live API uses stateful WebSocket sessions; Google’s docs recommend server-side session context persistence, reconnection with exponential backoff, and sensible timeouts because the session remembers conversational context across the stream. (docs.cloud.google.com) Tool/function calls and multimodal outputs travel over the same low-latency bidi stream, meaning platform dispatchers should handle asynchronous function-call requests and simultaneous audio+visual parts rather than sequential, turn-based tooling. (docs.cloud.google.com) Google published the gemini-3.1-flash-live-preview entry and changelog on March 26, 2026, and announced that the model is available to developers via the Gemini Live API in Google AI Studio while powering Gemini Live and Search Live worldwide. (ai.google.dev) Google is applying SynthID provenance watermarking to generated media (including audio) for detection of AI-created content, so enterprise audio pipelines and compliance filters must treat generated audio as watermarked content and surface verification metadata where required. (support.google.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Published by The Daily Scout - Be the smartest in the room.