M4 Neural Engine opened for training

A reported technical breakthrough claims the M4 Neural Engine has been cracked open for full AI training at 1.78 TFLOPS with 11% utilisation, and is described as 80× more efficient than an Nvidia A100 on that metric. The social thread frames this as enabling more capable local ML on Macs and new on‑device training use cases. (x.com)

Apple’s Neural Engine is the chip block in an M4 Mac that usually runs trained artificial intelligence models, not the part that teaches them. A public GitHub project now shows an M4 doing the teaching step too by driving the Neural Engine through reverse-engineered private interfaces. (github.com) The code reports 9.3 milliseconds per training step on an M4 for one transformer layer with width 768 and sequence length 512. It says the run sustained 1.78 tera floating-point operations per second at 11.2 percent Neural Engine utilization, with six Neural Engine kernel launches per step. (github.com) In plain terms, training means running a model forward to make a prediction and backward to adjust weights after seeing the error. The repository says both the forward pass and the backward pass for activations run on the Neural Engine, while weight-gradient math and Adam optimizer updates still run on the central processing unit through Apple’s Accelerate library. (github.com) Apple markets the M4 with a 16-core Neural Engine rated at up to 38 trillion operations per second, and Apple says the M4 family brings a faster Neural Engine to Mac hardware. Apple does not offer public Core Machine Learning training interfaces for direct Neural Engine training, which is the gap this project is trying to work around. (apple.com 1) (apple.com 2) The project describes the Neural Engine less like a general graphics processor and more like a sealed appliance that runs whole compute graphs at once. Its author says that using private classes named `_ANEClient`, `_ANECompiler`, and `_ANEInMemoryModelDescriptor` made it possible to compile and execute those graphs without going through Core Machine Learning. (blog.themenonlab.com) That matters for laptops because the M4 ships in Macs with unified memory and battery limits, where watts matter as much as speed. A secondary report comparing the project’s measured throughput with published peak figures for Nvidia’s A100 says the “80 times” figure refers to tera floating-point operations per watt, not to total training speed. (apple.com) (blog.themenonlab.com) The same report says the M4 Neural Engine works out to about 6.6 tera floating-point operations per watt versus about 0.08 for an A100 under that comparison. It also says the A100 still delivers far higher raw throughput, which keeps data-center graphics processors in a different class for large-model training. (blog.themenonlab.com) The repository also argues Apple’s 38 trillion-operations figure should not be read as double-speed 8-bit math for this use case. The author says the Neural Engine dequantizes 8-bit weights to 16-bit floating point before compute, and lists the M4 Neural Engine’s practical peak as 15.8 tera floating-point operations per second in the repository and about 19 tera floating-point operations per second in a separate write-up, a mismatch that has not been clarified in Apple documentation. (github.com) (blog.themenonlab.com) (apple.com) The code is a proof of concept, not a supported Apple feature, and it was tested on macOS 15 or later on Apple silicon, according to the write-up. Apple had not publicly documented or endorsed this training path in the sources reviewed here as of April 12, 2026. (blog.themenonlab.com) (apple.com) If the work holds up, the immediate result is not an M4 replacing an Nvidia server rack. It is a Mac using a chip Apple built for on-device model execution to do part of the model-learning job locally, with the limits and caveats the repository spells out. (github.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.