Telos ships Apple‑Silicon embeddings
Telos dropped mlx‑embeddings v0.1.0 for Apple Silicon, adding support for Qwen3 VL plus a Reranker, ColDefics3 and LoRA adapters — a clear play for local AI prototyping on M‑chips Telos announced mlx-embeddings v0.1.0 for Apple Silicon. The demo clip (1.3k views) shows developer‑focused tooling that shortens the loop for on‑device embedding, ranking and fine‑tuning experiments on Macs.
The mlx‑embeddings GitHub repository logged a change last month that explicitly added ColIdefics3 support and LoRA adapters to the codebase github.com. Qwen3‑VL embedding weights — including community uploads for 2B and 8B variants — are available in the mlx‑community organization on Hugging Face, where model pages and forks are actively updated. huggingface.co One community implementation for Apple Silicon reports throughput figures (up to 44K tokens/sec) across 0.6B, 4B and 8B Qwen3 embedding servers, demonstrating the performance targets developers are testing on M‑chips. github.com A companion package for on‑device fine‑tuning, mlx‑embeddings‑lora, is published on PyPI (version 1.0.5) with a release logged on Nov 13, 2025, formalizing LoRA workflows for contrastive and ranking training on Macs. pypi.org Apple’s MLX framework and its community repos show active maintenance — the main MLX repo has recent commits and explicit tuning for M5 Pro/Max in the changelog, indicating ongoing optimization for newer Apple silicon. github.com Separate projects are packaging MLX embeddings behind OpenAI‑compatible endpoints for local use; a freshly updated repo last month provides a drop‑in embeddings server that mirrors OpenAI’s /v1/embeddings API for Apple Silicon hosts. github.com