Google launches Gemini Embedding 2

- Google made Gemini Embedding 2 generally available in late April after a March preview, turning its first native multimodal embedding model into a production API. - The model puts text, images, video, audio, PDFs, and documents into one shared vector space, with 3,072-dimensional outputs and support for 100+ languages. - That matters because retrieval systems no longer need separate pipelines per media type — one index can power search, RAG, and classification.

Embeddings are the part of AI that most users never see, but they quietly decide whether search, recommendations, and RAG systems feel smart or dumb. They turn messy inputs into vectors, so a system can find “things like this” fast. The problem has been fragmentation — text lived in one embedding model, images in another, video somewhere else, and stitching them together was awkward. Google’s move with Gemini Embedding 2 is that it tries to collapse all of that into one production model, and as of April 22 it’s no longer just a preview experiment. (blog.google) ### What is Google actually launching? Gemini Embedding 2 is Google’s first natively multimodal embedding model in the Gemini API. “Embedding” here means the model does not generate prose or images — it generates numerical representations that let systems compare meaning across inputs. The new part is that text, images, video, audio, and documents all la(blog.google)hed it to general availability on April 22, 2026. (blog.google) ### Why does one shared space matter? Because cross-modal search gets much simpler. A developer can ask with text — “red sneakers with white soles” — and retrieve matching product photos. Or use an image as the query and pull back related documents, clips, or catalog entries. Basically, the model is trying to make “these things mean the same thing” work across formats, not (blog.google)(ai.google.dev) ### What can it actually take as input? The practical limits are a big part of the story. Google says a single call can handle up to 8,192 text tokens, 6 images, 120 seconds of video, 180 seconds of audio, and 6 PDF pages. On Vertex AI, the model returns 3,072-dimensional vectors. Those details matter because embeddings are infrastructure — developers need to know whether the model can ingest real product data, re(ai.google.dev)developers.googleblog.com) ### Is this replacing the old text embedding model? Not completely. Google still keeps `gemini-embedding-001` around for text-only use cases. But the center of gravity is clearly shifting. The docs now describe `gemini-embedding-2` as the latest model and the first multimodal embedding model in the Gemini API, which tells you where future retrieval tooling is heading. If your stack o(developers.googleblog.com)o, or PDFs, the new model is the obvious path. (ai.google.dev) ### Where does RAG fit into this? This is really a retrieval story. RAG systems work by finding relevant context before generation, and embeddings are the lookup layer that makes that possible. With Gemini Embedding 2, a company can index support docs, diagrams, screenshots, training videos, and recorded calls in one semantic system. That means an assistant can fetch evidence from mixed media instead of pretending(ai.google.dev)actly that — agentic multimodal RAG, visual search, and classification pipelines. (developers.googleblog.com) ### What’s the catch? The catch is that embeddings are only half the system. You still need chunking, indexing, metadata, ranking, and evaluation. And one subtle API change matters — Gemini Embedding 2 can aggregate multiple inputs into a single embedding, which is useful, but it also means developers need to be careful if they expected one vector per item by default. This is better thought of as a stronger foundation, not a finished retrieval stack in a box. (ai.google.dev) ### Why is Google pushing this now? Because multimodal AI is moving from demo to plumbing. Google says the model saw thousands of production deployments, and it has already added Batch API support for higher-volume, lower-cost embedding jobs. That suggests the company thinks embeddings are no longer just a research feature — they’re becoming a standard backend layer for enterprise search and AI apps. (d([ai.google.dev)pi-now-supports-embeddings-and-openai-compatibility/)) ### Bottom line Gemini Embedding 2 is not flashy in the chatbot sense, but it may matter more than another model benchmark chart. It gives developers one shared retrieval layer for text, images, video, audio, and documents. If that works well in practice, multimodal search stops being a special project and starts looking like normal infrastructure. (blog.google)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.