Taalas Claims to Etch AI Models Directly on Silicon
Startup Taalas claims it can etch AI models directly onto transistors, a method it says provides a step-function improvement in inference speed and efficiency. The approach aims to bypass memory bottlenecks and could disrupt cost-performance calculations for edge and latency-sensitive applications.
- Taalas was co-founded by Ljubisa Bajic, who also co-founded and formerly served as CEO and CTO of the AI chip company Tenstorrent. - The company has raised over $200 million in funding from investors including Quiet Capital and Fidelity. - Their first chip, the HC1, is designed specifically for the Llama 3.1 8B model and is built on TSMC's 6nm process. - Taalas claims the HC1 can generate 17,000 output tokens per second, which it states is 73 times more than an Nvidia H200 GPU while using one-tenth of the power. - This performance is achieved by "hardwiring" the model's weights onto the chip, which reduces the need for high-bandwidth memory and avoids related bottlenecks. - While this specialization boosts performance for a single model, it also means a new chip is required for any new or updated AI model. - The company's roadmap includes a chip for a 20-billion parameter model expected in the summer, followed by a next-generation "HC2" chip designed for frontier models. - Taalas' business model will involve selling both inference-as-a-service and the specialized hardware itself.