Databricks puts Qwen3 embedding in preview
Databricks launched a public preview of a SOTA embedding model, Qwen3‑Embedding‑0.6B, aimed at agentic workflows — a direct nudge toward heavier vector search and low‑latency inference requirements. Embedding previews like this tend to move production traffic to GPU‑accelerated serving stacks. (databricks.com)
Databricks published the model's documentation and sample endpoint name (databricks-qwen3-embedding-0-6b) on March 17, 2026 as part of its Foundation Model APIs listing. (databricks.com) (databricks.com/blog/sota-embedding-model-agentic-workflows-now-public-preview) Qwen3‑Embedding‑0.6B is a ~0.6B‑parameter model with a 32K‑token context window and supports up to 1,024‑dimensional embeddings that can be truncated to as few as 32 dimensions via “Matryoshka” embeddings. (huggingface.co) (huggingface.co/Qwen/Qwen3-Embedding-0.6B) (databricks.com/blog/sota-embedding-model-agentic-workflows-now-public-preview) Databricks positions the model for retrieval‑powered agents by plugging it directly into Agent Bricks and Mosaic Vector Search so documents can be indexed and retrieved from governed Delta Lake data during agent execution. (databricks.com) (databricks.com/blog/sota-embedding-model-agentic-workflows-now-public-preview) (databricks.com/product/model-serving) The model is exposed through Databricks’ Foundation Model APIs with both pay‑per‑token and provisioned‑throughput deployment modes, and Databricks explicitly recommends provisioned throughput for production inference. (docs.databricks.com) (docs.databricks.com/aws/en/machine-learning/foundation-model-apis/api-reference) (docs.databricks.com/aws/en/machine-learning/foundation-model-apis/deploy-prov-throughput-foundation-model-apis) Databricks’ Model Serving is a serverless, GPU‑optimized serving surface with LLM optimizations and explicit support for GPU acceleration, meaning production deployments of Qwen3 embeddings are likely to be placed on GPU‑backed provisioned endpoints for low‑latency vector inference. (databricks.com) (databricks.com/blog/announcing-gpu-and-llm-optimization-support-model-serving) (databricks.com/product/model-serving) Databricks reports MTEB leaderboard wins for the Qwen3 family and says the 0.6B embedding outperforms most other 0.6B models and rivals much larger 7B+ models on multilingual MTEB and English v2 leaderboards. (databricks.com) (databricks.com/blog/sota-embedding-model-agentic-workflows-now-public-preview) (huggingface.co/Qwen/Qwen3-Embedding-0.6B)