On‑device AI momentum on Apple Silicon

Multiple recent demos and tooling posts show on‑device AI maturing on Apple Silicon: Perplexity launched a Mac‑focused 'Personal Computer' to orchestrate local files and apps, an OpenMed demo runs private PII models on iPhone in under 2 seconds using MLX, and community ports demonstrate heterogeneous ANE+GPU acceleration plus Apple‑optimized LLM servers like oMLX. ( )

Running AI directly on Apple chips is moving from lab demo to shipping software on Macs and iPhones. Perplexity rolled out a Mac product on April 16, 2026 that can work across local files and apps instead of only the web. (perplexity.ai) Perplexity said its new “Personal Computer” plugs local files, tools, and apps into its existing agent system, and it pitched the Mac mini as a 24/7 machine for persistent tasks. The company said users can trigger it from Notes, have it read a to-do list, and act across iMessage, email, local files, connected apps, and the open web. (perplexity.ai) The technical shift underneath these demos is simple: on-device AI keeps the model and data on the phone or Mac instead of sending every request to a remote server. Apple’s MLX framework is built for Apple Silicon’s shared memory design, so the same data can be used by the central processor and graphics processor without being copied back and forth. (machinelearning.apple.com) That matters on Apple hardware because MLX already targets the central processor and graphics processor, while the Apple Neural Engine still usually requires Core ML. A recent Apple Silicon inference paper and a separate engineering write-up both describe hybrid setups that split work between the Neural Engine and graphics processor to improve efficiency. (atomgradient.github.io) (blog.squeezebits.com) Healthcare is one place where local processing is especially useful because personal health information can stay on the device. OpenMed 1.0.0 says it now supports MLX-accelerated local inference on Apple Silicon and ships a Swift package, OpenMedKit, for on-device clinical named-entity recognition and personally identifiable information detection on iOS and macOS. (openmed.life 1) (openmed.life 2) OpenMed’s documentation says the toolkit supports personally identifiable information extraction and de-identification, while its GitHub repository says version 1.0.0 added MLX hardware-accelerated inference for macOS and iOS about two weeks ago. The project site also says its model catalog now spans more than 750 Hugging Face models across 13 biomedical categories. (openmed.life) (github.com) (openmed.life) A separate part of the stack is model serving: the software layer that keeps a model loaded and answers requests from apps. oMLX, a Mac-native server for Apple Silicon, says it supports continuous batching, SSD-based key-value cache storage, OpenAI-compatible and Anthropic-compatible endpoints, and direct integrations with Claude Code, OpenClaw, and Cursor. (omlx.ai) (github.com) oMLX says its latest release is v0.3.5, requires macOS 15 or later on Apple Silicon, and recommends 64 gigabytes of memory or more for larger models even though 16 gigabytes is the minimum. Its GitHub repository showed more than 10,300 stars and a code update 19 hours before this check, a sign that Apple-focused local inference tooling is being updated in near real time. (omlx.ai) (github.com) The result is a clearer split in how Apple Silicon AI is developing in 2026: agents that can touch local apps, private models that can stay on an iPhone or Mac, and servers tuned for long-running local workloads. The common thread is that more of the work is happening on the device Apple already sells, not in someone else’s cloud. (perplexity.ai) (openmed.life) (omlx.ai)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.