Apple's Neural Engine slashes latency
- Apple’s Neural Engine now runs full vision models in single‑digit milliseconds on-device, enabling local inference without cloud round trips for privacy and speed. - Benchmarks cited in the brief show Moondream edge inference dropping from 687 microseconds to 130 microseconds using Apple Metal shaders, outpacing cloud GPU latency on small models. - That cuts reliance on centralized compute for many vision tasks and accelerates adoption of on‑device, privacy‑preserving AI. (x.com) (x.com)