Liquid AI unveils LFM2.5-8B-A1B model

- Liquid AI said on May 28 it released LFM2.5-8B-A1B, an open-weight on-device Mixture-of-Experts model for laptops and phones. (liquid.ai) - The key spec is 8 billion total parameters with 1.5 billion active per forward pass, plus a 128,000-token context window. (docs.liquid.ai) - The model is available now through Liquid AI’s docs, Playground and Hugging Face repositories, with support for llama.cpp, MLX, vLLM and SGLang. (liquid.ai)

Liquid AI said on May 28 that it released LFM2.5-8B-A1B, an open-weight language model aimed at running on consumer devices rather than only in the cloud. The company described the model as an on-device Mixture-of-Experts system built for tool calling, instruction following and agent-style tasks on laptops and phones. (liquid.ai) Liquid AI said the release is available through its Playground, documentation site and Hugging Face repository. (docs.liquid.ai) ### What exactly did Liquid AI release? LFM2.5-8B-A1B is an 8 billion-parameter Mixture-of-Experts model with 1.5 billion active parameters per forward pass, according to Liquid AI’s model documentation. (liquid.ai) That sparse setup is meant to keep inference costs closer to a much smaller model while preserving some of the quality gains associated with a larger one. Liquid AI said the new model builds on its earlier LFM2-8B-A1B release from October 2025. The company said the update adds a 128,000-token context window, expands pretraining from 12 trillion to 38 trillion tokens, and incorporates large-scale reinforcement learning. (liquid.ai) ### Why is the “A1B” label important? The “A1B” label refers to the active-parameter budget rather than the full parameter count. In practice, Liquid AI’s documentation says the model combines 8 billion total parameters with only 1.5 billion active during each forward pass. (docs.liquid.ai) That matters because on-device deployment is constrained by memory, power and latency. Liquid AI said the model is designed to run “comfortably even on an entry-level laptop,” while still handling tool calls and longer, multi-step prompts. (liquid.ai) ### What did Liquid AI say changed from the earlier version? Liquid AI said the context window increased from 32,768 tokens in the earlier LFM2-8B-A1B to 128,000 tokens in the new release. The company also said it doubled the vocabulary from 65,536 to 128,000 tokens to improve tokenization efficiency for non-Latin languages including Hindi, Thai, Vietnamese, Indonesian and Arabic. (docs.liquid.ai) The company also described LFM2.5-8B-A1B as a reasoning-only model that produces an explicit chain of thought before a final answer. Liquid AI said it chose that approach because sparse MoE models tend to run in compute-bound settings, making additional reasoning tokens relatively cheap. (liquid.ai) ### What performance claims are verified, and what remains only a social-media claim? Liquid AI’s official materials verify the model’s release date, architecture, context length, training scale and supported runtimes. The company also said the model is the fastest in its size class on CPU and GPU inference, but the benchmark details visible in the official materials opened here do not include the specific “253 tokens per second on Apple M5 Max” figure cited in social posts. (liquid.ai) Because that 253 tokens-per-second number was not visible in the official Liquid AI pages or the Hugging Face model card reviewed here, it should be treated as an unverified social-media benchmark unless Liquid AI publishes the underlying test setup separately. (liquid.ai) ### Where can developers actually run it? Hugging Face lists the model as available for use with Transformers and shows deployment examples for vLLM and SGLang. Liquid AI’s documentation also lists support for llama.cpp, MLX, vLLM and SGLang, and the company has published related GGUF, ONNX and MLX variants for local inference workflows. (liquid.ai) Liquid AI said both the base checkpoint and the post-trained model are available now. The next practical step for developers is to pull the model from Hugging Face or Liquid AI’s docs and test it on local hardware using one of the supported runtimes. (liquid.ai) (docs.liquid.ai)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.