DeepMind posts conversational agents video
- Google DeepMind posted a new YouTube talk, “Building Conversational Agents,” with Thor Schaeff and Philipp Schmid walking through how to assemble production-style Gemini agents. - The video spans tool-using coding agents and realtime voice interfaces, with emphasis on session management, interruption handling, evaluation, and safe human handoffs. - It matters because the field is shifting from flashy demos to reliability work — the boring infrastructure that decides whether agents survive contact with users.
Conversational agents are the part of AI that sounds easy in a demo and gets messy the second a real user shows up. You can make a model talk. That part is not the hard part anymore. The hard part is getting a system to retrieve the right information, call the right tool, recover from mistakes, and know when to stop pretending it knows. That is basically the frame of Google DeepMind’s new “Building Conversational Agents” video, posted to YouTube on May 1, 2026, with Thor Schaeff and Philipp Schmid walking through how they think these systems should actually be built. (youtube.com) ### What did DeepMind actually post? It is a technical presentation centered on Gemini APIs and agent assembly. The public description says Schaeff and Schmid show how to build conversational agents ranging from tool-using coding agents to realtime voice interfaces. That matters because it places one video across two worlds people often treat separately — text agents that do work and voice agents that have to feel natural under latency pressure. (youtube.com)ional agent” a bigger claim than chatbot? A chatbot mostly generates replies. A conversational agent has to manage state across turns and connect language to actions. Once the system can search, retrieve documents, call software, or speak in real time, the failure modes multiply fast — wrong tool, stale context, awkward interruptions, fabricated answers, or a task that should have been handed back to a human. That is why the interesting part of thi(youtube.com)ign. (deepmind.google) ### What seems to be the core build pattern? The pattern is modular. Model plus retrieval plus tools plus orchestration plus evaluation. DeepMind’s older Sparrow work already made the same basic point from the safety side — dialogue quality is not just “did the sentence sound good,” but “was it useful, grounded, and within rules,” sometimes with web evidence pulled in when needed. The new talk looks like the product-facing version of that idea. You do not ship a single model. You ship a stack. (deepmind.google) ### Why do voice agents make this harder? Because voice adds timing. In a text chat, a pause is fine. In speech, a pause feels broken. Schaeff’s separate Gemini 3 voice session is useful context here — he focuses on speech-to-speech interaction, streaming audio and video, WebSocket integration, session management, and interruption handling. Those are not cosmetic details. They are the difference between a system that feels conversational and one that feels like a phone tree with better branding. (youtube.com) ### Why is evaluation such a big deal? Because agent failures are compositional. A model can be good at language and still fail the task. Think of it like a relay team — if retrieval drops the baton or the tool call misfires, the final answer can still sound smooth while being wrong. That is why teams keep coming back to evals, traces, and handoff logic. You need ways to test not just outputs, but the chain of decisions that produced them. (deepmind.google)at is the safety angle here? The safety angle is less “ban the bad words” and more “design for limits.” DeepMind’s dialogue research has long stressed that conversational systems can invent facts, give unsafe advice, or project confidence they have not earned. In practice, safer agents need grounded retrieval, explicit rules, and moments where the system declines, asks a follow-up, or hands the task off instead of bluffing. (deepmind.google)ged today? Not a new model launch. Not a benchmark drop. What changed is that DeepMind put a pretty clear marker down in public: the conversation is moving from agent hype to agent plumbing. The company is showing builders that the real work is orchestration, reliability, and UX under failure — the unglamorous layer between a model demo and a product people trust. (youtube.com) ### Bot(deepmind.google) DeepMind is telling developers that conversational AI is now an engineering discipline — not just a prompting trick. (youtube.com)