Macs get local LLM boost
Ollama is leveraging Apple’s MLX framework to make hardware‑accelerated model serving on Macs more seamless, enabling developers to run sophisticated LLMs locally with better performance. That shift makes laptop‑first model experiments and demoable portfolio projects easier to run without cloud spend. (myhostnews.com)
Ollama published a preview announcement on March 30, 2026 stating MLX is the “fastest way” to run Ollama on Apple silicon. (ollama.com) The preview release is distributed as Ollama 0.19 and is explicitly targeted at Apple Silicon Macs in the initial rollout. (macobserver.com) Ollama’s MLX integration uses a separate MLX runner process that communicates with the main Ollama server over HTTP, keeping MLX-specific code isolated from the platform-agnostic core. (deepwiki.com) Publishers testing the update report larger speedups on Apple’s latest M-series chips (including M5, M5 Pro and M5 Max), with gains visible in both time‑to‑first‑token and tokens‑per‑second metrics. (9to5mac.com) MLX is built to exploit Apple Silicon’s unified memory so tensors avoid cross‑bus copies and can map work across the GPU and Neural Engine; independent benchmarks and guides claim this can cut fine‑tuning overhead by roughly 30–40% versus traditional CPU/GPU tensor flows on the same chip. (opensource.apple.com) The Ollama update also adds NVFP4 model format support for denser quantization and includes internal caching improvements aimed at reducing repeated I/O for large models. (arstechnica.com) MLX is available as an open‑source project on GitHub (the mlx repo shows widespread community use and active commits), and Apple’s MLX docs state the framework runs on any Apple platform that supports Metal. (github.com)