Google Unveils Gemini Embedding 2

Published by The Daily Scout

What happened

Google launched Gemini Embedding 2, a multimodal AI that maps text, images, video, and audio into a unified embedding space, enabling advanced RAG across 100+ languages.

Why it matters

Gemini Embedding 2 supports up to 8,192 input tokens for text and can process up to six images per request in PNG and JPEG formats. It also handles videos up to 120 seconds in MP4 and MOV formats, and PDF documents up to six pages. Audio is processed directly, removing the need for transcription. This model uses Matryoshka Representation Learning (MRL), allowing developers to scale down embedding vector dimensions to balance performance and storage costs. Google recommends dimensions of 3,072, 1,536, and 768 for optimal quality. It's available through Google's Gemini API and Vertex AI. Gemini Embedding 2 outperforms competing models like Amazon's Nova 2 and Voyage Multimodal 3.5 in benchmarks across text, image, video, and spoken language tasks. Notably, it shows significant gains in text-to-video tasks. The model simplifies AI pipelines by mapping different data formats into a unified representation.

Key numbers

  • Google launched Gemini Embedding 2, a multimodal AI that maps text, images, video, and audio into a unified embedding space, enabling advanced RAG across 100+ languages.
  • Gemini Embedding 2 supports up to 8,192 input tokens for text and can process up to six images per request in PNG and JPEG formats.
  • It also handles videos up to 120 seconds in MP4 and MOV formats, and PDF documents up to six pages.
  • Google recommends dimensions of 3,072, 1,536, and 768 for optimal quality.

Quick answers

What happened in Google Unveils Gemini Embedding 2?

Google launched Gemini Embedding 2, a multimodal AI that maps text, images, video, and audio into a unified embedding space, enabling advanced RAG across 100+ languages.

Why does Google Unveils Gemini Embedding 2 matter?

Gemini Embedding 2 supports up to 8,192 input tokens for text and can process up to six images per request in PNG and JPEG formats. It also handles videos up to 120 seconds in MP4 and MOV formats, and PDF documents up to six pages. Audio is processed directly, removing the need for transcription. This model uses Matryoshka Representation Learning (MRL), allowing developers to scale down embedding vector dimensions to balance performance and storage costs. Google recommends dimensions of 3,072, 1,536, and 768 for optimal quality. It's available through Google's Gemini API and Vertex AI. Gemini Embedding 2 outperforms competing models like Amazon's Nova 2 and Voyage Multimodal 3.5 in benchmarks across text, image, video, and spoken language tasks. Notably, it shows significant gains in text-to-video tasks. The model simplifies AI pipelines by mapping different data formats into a unified representation.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Published by The Daily Scout - Be the smartest in the room.