Google Meet gets realtime voice loops
- Google Meet already offers live AI help inside meetings, while Google and OpenAI both now expose realtime voice APIs that can trigger tools during a call. - Google’s Ask Gemini in Meet can summarize ongoing discussion, list action items, and stay private to each user; Gemini Live supports tool use and 70 languages. - The shift is from post-call summaries to in-call actions, with stricter latency and admin controls. (support.google.com)
Realtime voice systems keep an audio session open, listen as people speak, and answer before the conversation has fully moved on. Google and OpenAI now both offer that kind of low-latency loop for developers. (ai.google.dev) (developers.openai.com) Google’s Gemini Live API is a streaming interface for voice and vision. It takes continuous audio, supports interruptions, and can call tools such as function calls and Google Search while the session is still running. (ai.google.dev) OpenAI’s Realtime API works in a similar shape. Developers connect over WebRTC or WebSocket, stream speech in and out, and can update the session state and trigger function calling during the conversation. (developers.openai.com) Google Meet already has pieces of this inside the product. Ask Gemini in Google Meet can summarize an active discussion, list decisions and action items, and answer questions during the meeting instead of waiting for a recording to finish. (support.google.com) Google says those in-meeting responses can draw on live captions, Google Workspace files the user is allowed to access, Google Search, and public websites. Google also says the interaction is private to that user and that caption data is not stored after the meeting ends. (support.google.com) Meet also has a separate “take notes for me” feature that writes notes and action items in real time, saves them into a Google Doc, and attaches that document to the Calendar event. That is the clearest production example of a live meeting loop already doing work before the call ends. (workspace.google.com) Google’s broader Meet pitch adds translated captions in more than 60 languages and adaptive audio that synchronizes multiple laptops in one room to cut echo. Those features show how much of the hard work in live assistance is not the model alone, but routing, timing, and audio control. (workspace.google.com) On the developer side, Google has been pushing newer voice features such as proactive audio and asynchronous function calling in Gemini Live. In a March 2026 Google for Developers session, product lead Shrestha Basu Mallick described those capabilities as part of the Live API roadmap and demo stack. (youtube.com) That makes the current wave of demos easier to read: the novelty is less “AI can join a call” than “AI can do work mid-sentence.” In practice, that means fetching context, drafting notes, translating speech, or surfacing next steps while people are still talking. (ai.google.dev) (developers.openai.com) The constraint is speed. Google’s Live API documentation recommends direct client connections for better streaming performance, while OpenAI structures Realtime as a stateful session that developers continuously update as audio arrives. (ai.google.dev) (developers.openai.com) For users, the visible change is simple: meeting software is starting to act during the conversation, not after it. For admins and developers, the real work shifts to permissions, privacy, and keeping the loop fast enough that nobody notices the machinery. (support.google.com) (ai.google.dev)