Google Gemini Adds AI Music Generation Feature
Google has launched Lyria 3, a new music generation feature integrated into its Gemini model. The tool allows users to create 30-second songs from text, photo, or video prompts. The update positions Gemini as a more comprehensive multimodal foundation model for both creative and enterprise applications.
- To mitigate copyright concerns, all audio generated by Lyria 3 is embedded with SynthID, an imperceptible, inaudible watermark designed to identify the content as AI-generated. This watermarking technology is robust against common modifications like MP3 compression or noise addition. - The development and training of large-scale models like Lyria and Gemini are powered by Google's custom-designed Tensor Processing Units (TPUs). This vertical integration of hardware and models is a strategic advantage, allowing for co-design that optimizes performance and cost-effectiveness for training and inference at scale. - For enterprise developers, Lyria is accessible through the Vertex AI API, allowing for the integration of music generation into third-party applications. This opens up go-to-market opportunities for use cases in advertising, gaming, and scalable content creation. - The recurring operational expense of running a model at scale, known as inference cost, is a critical factor for the profitability of AI applications. This cost is continuous and usage-based, making the performance-per-dollar of the underlying hardware, like Google's TPUs versus competitor GPUs, a key consideration for MLOps teams. - The competitive landscape for AI music generation includes startups like Suno, which has gained significant traction and is valued at an estimated $2 billion, and other major tech players like Adobe and potentially OpenAI. These companies are competing to offer more advanced features, such as generating full-length songs with vocals and providing DAW-like editing capabilities. - Lyria 3 is part of a broader family of music generation models from Google DeepMind, which also includes Lyria RealTime for interactive, streaming music creation. This signals a strategy to address different segments of the market, from casual creators to developers building real-time interactive experiences. - While the consumer-facing feature in the Gemini app generates 30-second clips, the underlying Lyria model available via the Vertex AI API for developers produces instrumental tracks that are 32.8 seconds long.