Apple's Neural Engine Unlocked for Training
A major breakthrough has reportedly unlocked the Apple Neural Engine (ANE) on M4 chips for full neural network training, including backpropagation. Early tests show it hitting 1.78 TFLOPS at low utilization, making it potentially 80x more power-efficient than an NVIDIA A100. This could dramatically lower the cost of fine-tuning models on local, consumer-grade hardware.
The breakthrough comes from developer Manjeet Singh, who reverse-engineered the ANE's private APIs to bypass Apple's CoreML framework. This allows for direct computation on the ANE, a chip Apple has historically reserved for inference tasks like Face ID and Siri, with no public-facing tools for training. The project successfully trained a 109M-parameter Llama2-architecture transformer, demonstrating for the first time that backpropagation is possible on the hardware. Apple's official rating for the M4 ANE is 38 trillion operations per second (TOPS), but this figure is based on INT8 precision. Singh's analysis reveals the true floating-point performance (FP16) is closer to 19 TFLOPS. The initial training tests achieved a sustained 1.78 TFLOPS while using only 11.2% of the ANE's capacity, indicating significant room for performance optimization. The ANE's architecture is fundamentally different from a GPU, designed as a graph execution engine that processes an entire neural network as a single operation. This structure, combined with a peak power draw of just 2.8 watts, leads to its dramatic efficiency. The resulting 6.6 TFLOPS per watt is roughly 80 times more efficient per FLOP than an NVIDIA A100 data center GPU, which is power-hungry and not designed for edge devices. This development slots into a growing ecosystem for local AI on Apple Silicon. Frameworks like Apple's own MLX already enable efficient fine-tuning of models on the unified memory of M-series chips, bypassing the need for expensive cloud GPUs. Unlocking the ANE for training, however, could shift performance and efficiency by an order of magnitude, making sophisticated model customization accessible on consumer laptops and iPads.