On‑device inference hits new speeds

- Benchmarks show Qwen3.5 variants running at about 48/27 tokens/sec on the iPhone Neural Engine, while GPU‑accelerated WHIR runs 2x+ faster on M1/M3 in tests. - Reports also note Apple has approved Nvidia‑branded heavy‑workload drivers for Mac, hinting at broader Mac acceleration options. - These on‑device gains keep edge inference competitive versus datacenter models and suggest Apple hardware remains a viable inference play. (x.com) (x.com) (x.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.