mlxstudio brings native ML to macOS
A new native macOS app, mlxstudio, enables efficient local model execution on Apple Silicon, improving speed and memory for developer workflows and local inference. That reinforces the practical value of unified memory and native tooling for on‑device ML prototyping. (x.com)
MLX Studio bundles the vMLX inference engine into a native macOS desktop app with signed, notarized DMG releases that advertise running LLMs, VLMs and image‑generation models entirely offline on Apple Silicon. (github.com) Apple’s MLX framework exposes unified memory and, on M5 silicon, dedicated Neural Accelerators for matrix multiplications, allowing MLX operations to run on CPU or GPU without explicit memory copies. (machinelearning.apple.com) The app implements a five‑layer cache — prefix cache, paged KV, KV quantization, continuous batching and persistent disk cache — to keep large multi‑turn contexts available across switches and restarts. (mlx.studio) Continuous batching is advertised up to 256 concurrent sequences and a disk‑backed prompt cache intended to let a single Mac serve multiple clients and preserve warm context for rapid generation. (vmlx.net) vMLX publishes performance claims including “224× faster than LM Studio” at 100K context and “9.7× faster TTFT” on typical workloads, and it says storage‑boundary quantization makes 100K+ context feasible on a 16GB Mac. (vmlx.net) In‑app tooling lists auto‑detection for Qwen, Llama, Mistral, Gemma, Phi and DeepSeek models (plus vision models like Qwen‑VL and LLaVA) and includes a GGUF→MLX converter with JANG mixed‑precision profiles for 2‑bit to 8‑bit deployment options. (github.com ) (mlx.studio) Distribution choices target both end users and integrators: a GUI DMG that requires no Python or terminal steps and a publishable vMLX pip package for headless server or CI integration. (github.com) (vmlx.net)