Apple Neural Engine hits 1,000 tok/s
- An implementation of a privacy filter ported to Apple Neural Engine reportedly achieves more than 1,000 tokens per second, outperforming equivalent GPU/CPU runs. - The post credits ANE's unified memory bandwidth and custom mapping as the reasons for the throughput gain on Apple Silicon. - This shows ANE can be highly efficient for privacy‑preserving on‑device inference workloads where token throughput matters. (x.com/i/status/2047527844700717497)