WebGPU powers browser LLM demo
A demo ran a 1.7B 1‑bit LLM client‑side in the browser using WebGPU, achieving roughly 100 tokens per second from a 290MB model to showcase privacy‑focused edge inference without backend APIs. The work highlights an approach to keep inference local for smaller models. (x.com)