Gemini Embedding 2
- Google released Gemini Embedding 2, a multimodal embedding model covering text, image, video, audio, and PDFs. - The model maps all modalities into a single 3072‑dimensional space and scored over 68 on MTEB benchmarks. - A unified embedding simplifies multimodal retrieval and RAG pipelines for enterprise search and assistant applications. (x.com)
An embedding is a way to turn words, images, audio, or video into numbers so software can compare meaning instead of exact matches. Google said on March 10 that its new Gemini Embedding 2 does that for five formats in one model. (blog.google) Google released Gemini Embedding 2 in public preview through the Gemini API and Vertex AI. The company said the model maps text, images, video, audio, and documents into a single shared embedding space across more than 100 languages. (blog.google) Google’s developer docs list the API model as `gemini-embedding-2` and describe it as the first multimodal embedding model in the Gemini API. The same docs say the model supports cross-modal search, classification, and clustering, including searching for an image with a text prompt. (ai.google.dev, docs.cloud.google.com) The vector is 3,072 dimensions by default, with an option to shorten the output size for storage or speed tradeoffs. Google also added task instructions such as code retrieval or search result ranking so developers can tune embeddings for a specific job. (docs.cloud.google.com) Google said Gemini Embedding 2 scored above 68 on the Massive Text Embedding Benchmark, or MTEB, which is a widely used leaderboard for retrieval and classification tests. The company said that result put the model ahead of previous Google embedding models while extending them from text into multimodal search. (blog.google, huggingface.co) That matters for retrieval-augmented generation, the pattern where a chatbot fetches outside material before answering. Google’s embeddings guide says embeddings are commonly used to build retrieval systems, and a single model can remove the need for separate text, image, and document indexes. (ai.google.dev, docs.cloud.google.com) For enterprise search, the practical change is that a user can ask a question in text and retrieve a slide, a PDF page, a product image, or a video clip from the same index. Google said the model is aimed at multimodal retrieval, recommendation systems, and document search at scale. (ai.google.dev, blog.google) The launch also creates a migration issue for existing users. Google’s API documentation says embeddings from `gemini-embedding-001` and `gemini-embedding-2-preview` are incompatible, which means developers upgrading to the new model have to re-embed stored data before old and new vectors can be compared. (ai.google.dev) Google is positioning the model for production use beyond low-latency requests. In a separate developer post, the company said the Gemini Batch API now supports embeddings at higher rate limits and priced batch embedding requests at $0.075 per 1 million input tokens. (developers.googleblog.com) The release pushes Google’s embedding lineup from text search toward a single index for mixed media. If that approach holds up in production, the work of finding the right file may look less like keyword search and more like matching meaning across formats. (blog.google, ai.google.dev)