Full training on Apple Neural Engine

A community hack reverse‑engineered private ANE APIs to run full neural network training on Apple M4’s Neural Engine at about 9.3ms per step — showing the silicon could've supported on‑device training all along. The open‑source project (≈3.9k GitHub stars) refocuses the debate from hardware limits to software and API access for edge ML deployment. (x.com)

The public repository “maderix/ANE” is authored by the developer known as maderix and shows active commits with the project holding roughly 6.4k stars on GitHub as of the latest update. (github.com)) The code path deliberately bypasses CoreML by invoking private _ANEClient and _ANECompiler interfaces and emitting MIL (Model Intermediate Language) programs to run custom compute graphs directly on the ANE. (github.com)) Repository benchmarks and a reverse‑engineered fork report that a single transformer‑layer workload maps to six ANE kernel dispatches per training step, with all forward/backward dx passes executed on ANE while weight gradient accumulation and Adam updates are performed on the CPU via Accelerate cblas. (github.com)) Those benchmark notes also document an observed ANE utilization figure (reported as 11.2% in a single‑layer test) and explicit kernel names/functions used for attention, FFN, and SDPA backward passes in the training pipeline. (github.com)) A recent commit path added INT8 W8A8 quantization support for weights and activations, with the author reporting about a 1.88× ANE throughput improvement from quantize/dequantize optimizations. (github.com)) Follow‑on systems research named Orion extended the community work by cataloging 20 constraints on MIL IR programs, identifying 14 previously undocumented limits, and publishing an end‑to‑end runtime that likewise bypasses CoreML using the private APIs. (arxiv.org)) The project README and associated forks repeatedly label the effort as research — a proof‑of‑concept rather than a production training framework — while including detailed benchmarks of throughput, power, and SRAM behavior for Apple’s ANE silicon. (github.com))

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.