On‑device inference hits new speeds

- Benchmarks show Qwen3.5 variants running at about 48/27 tokens/sec on the iPhone Neural Engine, while GPU‑accelerated WHIR runs 2x+ faster on M1/M3 in tests. - Reports also note Apple has approved Nvidia‑branded heavy‑workload drivers for Mac, hinting at broader Mac acceleration options. - These on‑device gains keep edge inference competitive versus datacenter models and suggest Apple hardware remains a viable inference play. (x.com) (x.com) (x.com)

On‑device inference hits new speeds

Get your own daily briefing