Local-first on-device AI push

Commenters this week argued for 'local‑first' on‑device AI — saying small, local models can turn files and apps into instant context without network hops, lowering latency and making compute use more predictable. (x.com) (x.com)

On-device artificial intelligence runs a model on your phone or laptop instead of a remote data center, and that design is getting a fresh push from developers and platform vendors. (developer.android.com) In Apple’s setup, the model lives on the device for everyday language tasks, with developers able to call app tools and local databases through the Foundation Models framework on iOS 26, iPadOS 26, macOS 26, and visionOS 26. Apple said in June 2024 that its on-device language model has about 3 billion parameters, a rough measure of model size. (developer.apple.com) (machinelearning.apple.com) Google is making a similar case on Android. Its Gemini Nano runs through the AICore system service, which Google says keeps inference local, cuts network delay, and supports tasks like summarization, rewriting, image description, and speech recognition. (developer.android.com) Microsoft has also moved in that direction. In May 2025, it put Foundry Local into preview for Windows and macOS, saying developers could run models, tools, and agents directly on-device and choose whether jobs run on a central processing unit, graphics processor, or neural processing unit. (devblogs.microsoft.com) The basic pitch is simple: if the model sits next to your files and apps, it can read local context without waiting for a network round trip. Apple’s framework explicitly lets developers create tools that search a local or online database, and Google says on-device execution eliminates server calls. (developer.apple.com) (developer.android.com) That approach also changes the cost model. Instead of paying for every request to a cloud application programming interface, developers shift more of the work to hardware the customer already owns, while companies like Qualcomm pitch toolchains for compiling and quantizing models to fit phones and other edge devices. (aihub.qualcomm.com) The tradeoff is that local models are smaller and more constrained than the biggest cloud systems. Apple pairs its on-device model with a larger server model in Private Cloud Compute, and Google notes that local speed still depends on the hardware inside the device. (machinelearning.apple.com) (developer.android.com) Hardware vendors are building around that limit. Qualcomm says its AI Hub offers more than 175 pre-optimized models for Qualcomm devices and tools to convert models to on-device runtimes such as LiteRT and ONNX Runtime. (aihub.qualcomm.com) The result is a more hybrid artificial intelligence stack: small models handle fast, private work on the device, and bigger models stay in the cloud for heavier jobs. The current push for “local-first” AI is less about replacing the cloud than deciding which tasks never needed a network hop in the first place. (machinelearning.apple.com) (developer.android.com)

Local-first on-device AI push

Get your own daily briefing