Google Unveils Gemini Embedding 2

Published March 11, 2026 by The Daily Scout

Google launched Gemini Embedding 2, a multimodal AI that maps text, images, video, and audio into a unified embedding space, enabling advanced RAG across 100+ languages.

Why it matters

Gemini Embedding 2 supports up to 8,192 input tokens for text and can process up to six images per request in PNG and JPEG formats. It also handles videos up to 120 seconds in MP4 and MOV formats, and PDF documents up to six pages. Audio is processed directly, removing the need for transcription. This model uses Matryoshka Representation Learning (MRL), allowing developers to scale down embedding vector dimensions to balance performance and storage costs. Google recommends dimensions of 3,072, 1,536, and 768 for optimal quality. It's available through Google's Gemini API and Vertex AI. Gemini Embedding 2 outperforms competing models like Amazon's Nova 2 and Voyage Multimodal 3.5 in benchmarks across text, image, video, and spoken language tasks. Notably, it shows significant gains in text-to-video tasks. The model simplifies AI pipelines by mapping different data formats into a unified representation.

Key numbers

Google launched Gemini Embedding 2, a multimodal AI that maps text, images, video, and audio into a unified embedding space, enabling advanced RAG across 100+ languages.
Gemini Embedding 2 supports up to 8,192 input tokens for text and can process up to six images per request in PNG and JPEG formats.
It also handles videos up to 120 seconds in MP4 and MOV formats, and PDF documents up to six pages.
Google recommends dimensions of 3,072, 1,536, and 768 for optimal quality.

Sources

Google launched

Quick answers

What happened in Google Unveils Gemini Embedding 2?

Google launched Gemini Embedding 2, a multimodal AI that maps text, images, video, and audio into a unified embedding space, enabling advanced RAG across 100+ languages.

Why does Google Unveils Gemini Embedding 2 matter?

Gemini Embedding 2 supports up to 8,192 input tokens for text and can process up to six images per request in PNG and JPEG formats. It also handles videos up to 120 seconds in MP4 and MOV formats, and PDF documents up to six pages. Audio is processed directly, removing the need for transcription. This model uses Matryoshka Representation Learning (MRL), allowing developers to scale down embedding vector dimensions to balance performance and storage costs. Google recommends dimensions of 3,072, 1,536, and 768 for optimal quality. It's available through Google's Gemini API and Vertex AI. Gemini Embedding 2 outperforms competing models like Amazon's Nova 2 and Voyage Multimodal 3.5 in benchmarks across text, image, video, and spoken language tasks. Notably, it shows significant gains in text-to-video tasks. The model simplifies AI pipelines by mapping different data formats into a unified representation.

Google Unveils Gemini Embedding 2

What happened

Why it matters

Key numbers

Sources

Quick answers

What happened in Google Unveils Gemini Embedding 2?

Why does Google Unveils Gemini Embedding 2 matter?

Get your own daily briefing