Taalas ASIC Hits 17,000 Tokens/Sec
Custom silicon firm Taalas is now demonstrating its HC1 accelerator, which hardwires a Llama-3.1 8B model's weights into silicon. The chip is delivering up to 17,000 tokens per second per user. The company claims its approach, which uses 3–6 bit quantization, makes inference 13 times cheaper than Cerebras, at just $0.0075 per million tokens.
- Taalas, a Toronto-based startup founded in 2023, has raised over $200 million, with a significant $169 million funding round to support the development of its specialized AI processors. Investors include Quiet Capital and Fidelity. - The HC1 chip is manufactured using TSMC's 6-nanometer process, featuring 53 billion transistors on an 815mm² die. Taalas claims its partnership with TSMC enables a rapid two-month turnaround from model weights to deployable PCI-Express cards. - A server equipped with ten HC1 cards consumes approximately 2.5 kilowatts, with each card drawing about 200 watts. This power consumption is significantly lower than a typical GPU rack, which can range from 120-600 KW, and allows for air-cooling. - The Llama-3.1 8B model, hardwired into the HC1, is an auto-regressive language model with 8 billion parameters, built on an optimized transformer architecture. It was pretrained on approximately 15 trillion tokens of public data with a cutoff of December 2023. - Taalas's roadmap includes a second-generation HC2 silicon platform designed for frontier-level models, expected in the winter of 2026. - The competitive landscape for AI inference hardware includes companies like Groq, which develops "Language Processing Units" (LPUs), and Cerebras with its large Wafer-Scale Engine. Cerebras recently quoted a price of $0.10 per million tokens for its Llama 3.1 8B service. - This "ASIC-ization" of AI inference mirrors the evolution of Bitcoin mining, which transitioned from general-purpose CPUs to specialized hardware for performance and efficiency gains. Hyperscalers like Amazon, Google, and Microsoft are also heavily investing in custom silicon to reduce long-term costs and dependency on third-party vendors. - The market for AI inference chips is projected to grow from approximately $106 billion in 2025 to over $250 billion by 2030, driven by the increasing deployment of large language models.