Google devs: build a Gemini video agent
Google for Developers published a hands-on walkthrough to build a video-conference agent using Gemini 3.1 Flash Live API, the Fishjam SDK and a React frontend, including prompts and full implementation steps. The walkthrough is presented as practical material for portfolio projects that integrate real-time AI agents with web frontends. (x.com)
Google for Developers is now pointing developers to a step-by-step build for a browser-based video meeting agent that listens, sees, and replies in real time with Gemini’s Live application programming interface. (developers.googleblog.com) The underlying idea is a streaming model: instead of sending one prompt and waiting, a web app keeps an open WebSocket connection and feeds text, audio, or video continuously while the model answers as events arrive. Google’s Live application programming interface documentation says it supports bidirectional audio, video, and text, and treats the connection as a stateful session. (ai.google.dev) Google’s current low-latency model for that setup is Gemini 3.1 Flash Live Preview, which Google describes as an audio-to-audio model for real-time dialogue with multimodal awareness. The Live capabilities guide says Gemini 3.1 Flash Live replaced Gemini 2.5 Flash Live as the model family to target for new builds. (ai.google.dev) The video layer in this kind of app is separate from the model itself. Fishjam’s documentation pitches its toolkit as infrastructure for live audio and video streaming, with React guides and a React client package for handling cameras, microphones, peers, and room connections in the browser. (documentation.fishjam.io ) (documentation.fishjam.io) That division of labor is what the walkthrough is teaching: Fishjam handles the meeting room, device streams, and participant state, while Gemini handles the conversation logic over a live stream. Google’s web and codelab materials frame that pattern as a way to build real-time assistants that can sit inside ordinary front ends instead of separate demo apps. (documentation.fishjam.io) (developers.google.com) Google has been expanding that stack quickly since the Live application programming interface preview rollout in April 2025, when it said developers could build agents for customer support, education, and real-time monitoring with streaming audio, video, and text. Recent codelabs now package the same approach for developers who want higher-level tooling or portfolio-ready samples. (developers.googleblog.com) (codelabs.developers.google.com) The company’s own training materials now describe Live builds in plain product terms: voice and video interactions, interruption mid-response, and continuous input instead of turn-by-turn chat. That makes a video-conference agent less like a chatbot bolted onto a page and more like a participant wired into the meeting itself. (codelabs.developers.google.com) (ai.google.dev) For developers, the practical work is no longer just prompting a model. It is stitching together browser media capture, room management, authentication, streaming transport, and a model session that can react fast enough to feel conversational. (documentation.fishjam.io) (ai.google.dev) That is why Google is publishing hands-on material around this pattern now: the pieces for a real-time meeting agent already exist in its Live application programming interface docs, model guides, and web tutorials, and the new walkthrough turns them into a build developers can actually ship. (ai.google.dev) (developers.google.com)