YouTube Pilots Conversational AI on Smart TVs
YouTube is testing a conversational AI assistant on smart TVs, which allows users to ask questions about videos and receive answers without interrupting playback. The feature is intended to make content discovery and comprehension more seamless, illustrating a deeper integration of generative AI into mainstream consumer products.
- The conversational AI tool is powered by Google's Gemini model and is activated via an "Ask" button that appears on the YouTube interface for select users. This feature allows viewers to ask questions about the video's content using their TV remote's microphone, with the AI providing summaries and recommendations. - This test marks the expansion of the feature from mobile and desktop to larger screens, including smart TVs and gaming consoles. This move aligns with TV becoming the primary way people watch YouTube in the U.S. - The AI functionality is built upon Large Language Models (LLMs) that analyze a video's transcript and other data to provide answers. The underlying technology, LaMDA (Language Model for Dialogue Applications), is a transformer-based neural network trained on vast amounts of dialogue data to understand conversational nuances. - The pilot is currently available to a small group of YouTube Premium members over the age of 18 and supports English, Hindi, Spanish, Portuguese, and Korean. - This feature is part of a broader trend of integrating generative AI into streaming services to enhance user experience. Competitors like Amazon Prime Video have introduced similar AI-powered features, such as "X-Ray Recaps," which summarize plot points. - The system doesn't just answer direct questions; it also provides suggested prompts based on the video's content to encourage interaction. Users can ask for things like a list of ingredients from a cooking video or details about a song in the soundtrack. - Beyond answering questions, YouTube is leveraging AI for other features, including summarizing comment sections, automatically enhancing video quality, and enabling AI-driven search results. - The long-term vision for this technology is to make computing more accessible and conversational across all Google products, including Google Assistant and Workspace. The development of models like LaMDA 2 focuses on breaking down complex goals into smaller, actionable steps.