vLLM + TurboQuant for big evals

vLLM was compiled with TurboQuant to enable efficient quantized inference on large models like Qwen3.5‑27B, a practical step for lowering compute during large‑scale model evaluations and benchmarks. This kind of quantization pipeline matters for cost‑constrained production testing. (x.com)

Google Research published the TurboQuant blog post on March 24, 2026, introducing TurboQuant along with PolarQuant and Quantized Johnson‑Lindenstrauss and stating the work will be presented at ICLR/AISTATS 2026. (research.google) An open‑source TurboQuant implementation packaged as turboquant‑vllm is available on GitHub and PyPI and is described as a drop‑in vLLM plugin that registers via vLLM’s plugin system and can be enabled with a single CLI flag. (github.com) The published PyPI/docs benchmarks report ~3.76× KV‑cache compression and show the plugin reducing KV pages to about 68 bytes per token per head versus 256 bytes in FP16, with reported TQ4 configs delivering ~97% cosine similarity to baseline outputs. (libraries.io) The GitHub implementation (turboquant‑vllm) advertises 3.7–4.7× KV compression and a separate weight‑compression path (example: Qwen3‑30B BF16 weights from 59.7 GB → 16.8 GB), plus fused CUDA/Triton kernels with an automatic PyTorch fallback and no calibration or pre‑quantization step. (github.com) Qwen3.5‑27B model artifacts are available on Hugging Face and Qwen’s docs explicitly recommend vLLM for deployment; community notes show small vLLM modelopt/vision‑prefix patches are sometimes required when serving Qwen family vision‑enabled checkpoints. (huggingface.co) The public rollout was rapid: the first community TurboQuant→vLLM plugin claims a working prototype within ~72 hours and a PyPI release (1.3.0) with benchmarks appeared days after Google’s March 24 blog, driving immediate community experiments on RTX/H100/A100 hardware. (libraries.io)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.