Perplexity Releases New Embedding Models
Perplexity has released its new `pplx-embed` models, designed for web-scale retrieval tasks. The models are available via an API, providing a new option for developers building vector database workflows for retrieval-augmented generation (RAG) and semantic search applications.
Perplexity has released two distinct open-source model families: `pplx-embed-v1` for standard dense text retrieval and `pplx-embed-context-v1`, which embeds passages while considering the surrounding document's context. Both versions are available in 0.6 billion and 4 billion parameter sizes to balance latency and quality needs. A key technical differentiator is the use of a bidirectional architecture, which allows the models to understand a word's meaning by considering both preceding and following text. This contrasts with many popular decoder-only models that process text in one direction, making `pplx-embed` better suited for the nuances of information retrieval. The models introduce native INT8 and binary quantization, which dramatically reduces memory and storage costs. The INT8 version cuts storage by 4x compared to standard FP32 embeddings, while the binary version offers a 32x reduction with a performance drop of less than 1.6 percentage points on the 4B model. On the multilingual MTEB retrieval benchmark, the `pplx-embed-v1-4B` model achieves performance matching Alibaba's Qwen3-Embedding-4B and surpassing Google's gemini-embedding-001. Furthermore, the `pplx-embed-context-v1-4B` sets a new state-of-the-art on the ConTEB benchmark for contextual retrieval, outperforming models from Voyage and Anthropic. Unlike some competing models, `pplx-embed` does not require "instruction prefixes" to be added to queries and documents. This design choice simplifies the engineering pipeline and avoids potential performance degradation that can occur if instructions are inconsistent between indexing and query time. For developers building production RAG systems, the models are available on Hugging Face under a permissive MIT license and are compatible with standard frameworks like SentenceTransformers and ONNX. This open-source availability provides a powerful alternative to proprietary embedding APIs.