Google Unifies AI Embeddings with Gemini 2

Published by The Daily Scout

What happened

Google launched Gemini Embedding 2, unifying text/images/video/audio/PDF into one vector space for RAG/search and internal chatbots.

Why it matters

Gemini Embedding 2 supports text inputs of up to 8,192 tokens and can handle up to six images per request in PNG and JPEG formats. It also supports video input of up to 120 seconds in MP4 and MOV formats and can directly process audio without requiring transcription. Furthermore, the model supports embedding PDF files up to six pages. This multimodal model uses the Gemini architecture to generate embeddings from different data types and can combine inputs like text and images in a single request. This allows the system to understand the relationships between different media types and process datasets containing multiple formats. The model captures semantic meaning across more than 100 languages, enhancing AI tasks like RAG, semantic search, sentiment analysis, and data clustering. Gemini Embedding 2 incorporates Matryoshka Representation Learning (MRL), enabling embedding vectors to scale across different dimensions. The default dimension is 3,072, but developers can reduce the size to manage storage and performance needs. Google recommends using dimensions of 3,072, 1,536, or 768 for the highest quality. Early access partners have seen significant gains by using Gemini Embedding 2, including reduced latency and improved precision. For example, one partner in the creator economy reduced latency by 70% and doubled semantic similarity scores for text-to-video pairs. Another partner in legal tech improved precision across millions of legal records, enabling new search functionalities for visual evidence.

Key numbers

  • Google launched Gemini Embedding 2, unifying text/images/video/audio/PDF into one vector space for RAG/search and internal chatbots.
  • Gemini Embedding 2 supports text inputs of up to 8,192 tokens and can handle up to six images per request in PNG and JPEG formats.
  • It also supports video input of up to 120 seconds in MP4 and MOV formats and can directly process audio without requiring transcription.
  • The model captures semantic meaning across more than 100 languages, enhancing AI tasks like RAG, semantic search, sentiment analysis, and data clustering.

Quick answers

What happened in Google Unifies AI Embeddings with Gemini 2?

Google launched Gemini Embedding 2, unifying text/images/video/audio/PDF into one vector space for RAG/search and internal chatbots.

Why does Google Unifies AI Embeddings with Gemini 2 matter?

Gemini Embedding 2 supports text inputs of up to 8,192 tokens and can handle up to six images per request in PNG and JPEG formats. It also supports video input of up to 120 seconds in MP4 and MOV formats and can directly process audio without requiring transcription. Furthermore, the model supports embedding PDF files up to six pages. This multimodal model uses the Gemini architecture to generate embeddings from different data types and can combine inputs like text and images in a single request. This allows the system to understand the relationships between different media types and process datasets containing multiple formats. The model captures semantic meaning across more than 100 languages, enhancing AI tasks like RAG, semantic search, sentiment analysis, and data clustering. Gemini Embedding 2 incorporates Matryoshka Representation Learning (MRL), enabling embedding vectors to scale across different dimensions. The default dimension is 3,072, but developers can reduce the size to manage storage and performance needs. Google recommends using dimensions of 3,072, 1,536, or 768 for the highest quality. Early access partners have seen significant gains by using Gemini Embedding 2, including reduced latency and improved precision. For example, one partner in the creator economy reduced latency by 70% and doubled semantic similarity scores for text-to-video pairs. Another partner in legal tech improved precision across millions of legal records, enabling new search functionalities for visual evidence.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Published by The Daily Scout - Be the smartest in the room.