Apple pivots to small on-device models
- Apple has already built Apple Intelligence around a roughly 3 billion-parameter model that runs on devices, with larger requests sent to Private Cloud Compute. - At WWDC 2025, Apple opened that on-device foundation model to developers through its Foundation Models framework, extending the local-first design. - The shift puts routing between device and cloud at the center of Apple’s AI strategy. (apple.com)
Apple is not newly pivoting to small on-device models. Apple’s own AI system has used a roughly 3 billion-parameter on-device model since Apple Intelligence was introduced in June 2024. (apple.com 1) (apple.com 2) A language model is software that predicts the next word, like autocomplete scaled up into a chatbot or writing assistant. Apple says its version is split in two parts: a smaller model for the device and a larger one for harder requests in the cloud. (apple.com 1) (apple.com 2) The on-device model is designed to fit inside the memory and power limits of an iPhone, iPad, or Mac. Apple’s July 2024 technical report said that model has about 3 billion parameters and was built to run efficiently on Apple silicon. (apple.com) When a request needs more compute than a phone can handle, Apple sends it to Private Cloud Compute, a server system running on Apple silicon. Apple said in June 2024 that the system decides whether a request can be processed on device before sending anything to the cloud. (apple.com 1) (apple.com 2) That architecture makes privacy a product feature, not just a policy promise. Apple said personal data sent to Private Cloud Compute is not accessible to anyone other than the user, including Apple itself. (apple.com) The clearest sign of Apple’s direction came at Worldwide Developers Conference 2025 on June 9, 2025. Apple gave third-party developers direct access to the same on-device foundation model through a new Foundation Models framework. (apple.com) (apple.com) Apple’s developer documentation says apps can use that local model even without internet connectivity. The company also says developers can add tools so the model can call a database, a service, or app functions when the base model is not enough. (apple.com) (apple.com) That means the engineering problem is not only “make one giant model smarter.” It is also deciding what runs locally, what gets routed to Private Cloud Compute, and what should call a tool instead of guessing. (apple.com) (apple.com) Apple’s 2025 technical report shows the company kept pushing the same design. It described a new multilingual, multimodal on-device model at about 3 billion parameters and a separate scalable server model for Private Cloud Compute. (apple.com) So the story is less a sudden pivot than a clearer statement of Apple’s existing strategy: keep as much AI as possible on the device, and treat the cloud as backup for the jobs that do not fit. (apple.com) (apple.com)