Developers Now Training GPT Models Directly on iPhone

A developer has demonstrated the feasibility of training GPT models overnight directly on an iPhone's Neural Engine. The experiment involved training two separate models simultaneously, pushing the boundaries of what's considered possible for on-device machine learning and suggesting a future of highly personalized, locally trained AI.

The Apple Neural Engine (ANE) has historically been a black box for developers, with the CoreML framework abstracting away direct hardware control. A recent open-source project, Orion, bypassed CoreML entirely, using reverse-engineered private APIs to run and train models directly on the ANE, achieving over 170 tokens/second for GPT-2. This feat is built on a massive leap in hardware capability, from the A11 chip's first 0.6 TFlops Neural Engine in 2017 to the 35 TFlops in the A17 Pro. The latest M4 chip pushes this even further to 38 trillion operations per second, providing the raw power needed for on-device machine learning tasks. Direct ANE access reveals unique hardware constraints, such as a ~4.2 second recompilation penalty for every training update because weights are baked in at compile time. Early attempts at training were stopped by NaN divergence, a problem developers solved by implementing strict activation clamping to prevent fp16 overflow cascades. Apple's own strategy for on-device AI leverages a ~3 billion parameter model for its "Apple Intelligence" features, using techniques like low-bit palletization to average 3.7 bits-per-weight. This official approach, combined with the unified memory architecture of Apple Silicon, provides the foundation for efficient model execution by eliminating data transfer delays between the CPU, GPU, and Neural Engine. On-device training unlocks a new layer of personalization and privacy, where a model could continuously adapt to a user's accent or specific vocabulary without ever sending sensitive data to the cloud. This local adaptation is key for features in accessibility, health monitoring, and creating a truly personal assistant. The implications extend directly to manufacturing and supply chain management. An iPhone or iPad on a factory floor could run a dedicated model for visual defect detection, constantly fine-tuning itself on the parts from that specific production line. This approach offers real-time quality control and predictive maintenance with zero cloud latency and enhanced security for proprietary manufacturing data.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.