TorchAO hits 1M tokens/s

- TorchAO introduced quantization tooling that runs dense matrix mults in float8 and reports throughput up to 1,000,000 tokens per second. (x.com) - The demo uses layer‑level quantization to convert large attention/FFN ops to FP8 while keeping correctness checks in higher precision. (x.com) - This shows software quantization can push inference/training throughput an order of magnitude for some workloads without bespoke ASIC changes. (x.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.