Custom ASICs Poised to Accelerate On-Device AI

A new class of custom AI chips is enabling large language models to run directly on edge devices without cloud connectivity. One such chip, the Taalas HC1, can reportedly achieve local inference speeds of 16,960 tokens per second for Llama 3.1 8B. This hardware advancement supports the deployment of advanced agents and AI models on handhelds and fixed infrastructure for real-time, resilient operations.

- The Taalas HC1 chip achieves its speed by hardwiring the entire Llama 3.1 8B model and its weights directly onto the silicon, a "model-on-silicon" approach. This eliminates the traditional memory-to-compute bottleneck. - Taalas was founded by CEO Ljubisa Bajic, who previously founded Tenstorrent in 2016, along with early Tenstorrent engineering leaders Drago Ignjatovic and Lejla Bajic. The team has collective experience at AMD and NVIDIA. - The HC1 is manufactured by Taiwan Semiconductor Manufacturing Company (TSMC) using its 6-nanometer (N6) process. Taalas claims it can go from a finished AI model's weights to a deployable custom chip in about two months. - While the HC1 is dedicated to a single model, it retains some flexibility for fine-tuning through an onboard SRAM that can hold components like the KV cache and low-rank adapters (LoRAs). - This type of application-specific integrated circuit (ASIC) represents a trade-off, sacrificing the general-purpose programmability of GPUs for extreme performance on a single task. The HC1 is designed purely for inference and cannot be used for training models. - Running large models on the edge presents significant challenges including limited memory, computational power, and energy constraints. A 10-billion parameter model can require up to 20GB of memory even after optimization, exceeding the capacity of most mobile devices. - The Llama 3.1 8B model is an 8-billion parameter, decoder-only Transformer with 32 layers and a context window of up to 128,000 tokens. - The on-device AI market, excluding smartphones and PCs, was estimated at $10.1 billion in 2024 and is projected to grow to $30.6 billion by 2029. Including all devices, the market is estimated to be valued at $26.61 billion in 2025 and is expected to reach $124.07 billion by 2032.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.