Gemini Embedding 2 GA
- Google made Gemini Embedding 2 generally available for multimodal embeddings across text, images and video. - The release targets semantic search and retrieval use cases across mixed-media sources, per Google AI Studio posts. - Multimodal embeddings expand retrieval-augmented workflows used in RAG apps and agent memory features. (x.com)
Google has moved Gemini Embedding 2 from preview to general availability, turning its multimodal search model into a production API on April 22. (ai.google.dev) Embeddings turn content into number arrays so software can compare meaning instead of exact words. Google’s latest model places text, images, video, audio and PDF documents in one shared space, which lets an app match a text query to a photo, clip or file by semantic similarity. (ai.google.dev) Google first introduced Gemini Embedding 2 in public preview on March 10, 2026 through the Gemini API and Vertex AI. The company said the model was built on Gemini and supports more than 100 languages. (blog.google) The production version is listed as `gemini-embedding-2`, while Google still keeps `gemini-embedding-001` for text-only workloads. In the Gemini API docs, Google calls Embedding 2 its first multimodal embedding model. (ai.google.dev) The practical use case is retrieval: a system can store vectors for product photos, support PDFs, meeting audio or short videos, then pull back the closest match when a user asks a question. Google’s docs point to semantic search, classification, clustering and retrieval-augmented generation as the main targets. (ai.google.dev) That matters for the recent wave of AI assistants that need outside memory. Google said embeddings are used to fetch documents, conversation history and tool definitions for “context engineering,” its term for giving agents the working context they need at runtime. (developers.googleblog.com) Google’s model page says Gemini Embedding 2 returns 3,072-dimensional vectors by default, with smaller output sizes available through Matryoshka Representation Learning, a compression method that preserves useful information in shorter vectors. Google recommends 3,072, 1,536 and 768 dimensions as quality tiers. (blog.google) The input limits show where Google expects it to be used. The model supports up to 8,192 input tokens for text, as many as six images per request, one PDF up to six pages, audio up to 180 seconds, and video inputs capped by token limits or about 120 seconds at one frame per second without audio. (docs.cloud.google.com) Google also added task-specific formatting for retrieval jobs, such as labeling a query as “search query” and a stored item as a titled document. The company says those instructions help tune the vectors for the relationship a developer actually wants to measure. (ai.google.dev) The release closes a short launch cycle: preview on March 10, general availability on April 22. For developers building search, file retrieval and agent memory across mixed media, Google now has a stable endpoint instead of an experiment. (ai.google.dev)