Tenstorrent unveils Galaxy Blackhole server
- Tenstorrent said its Galaxy Blackhole AI server is now generally available, moving Blackhole from developer hardware into a full production system for inference. - The 6U box packs 32 Blackhole ASICs, 1 TB of GDDR6, and 23 PFLOPS FP8; a four-system supercluster starts at $440,000. - The bet is simple: beat GPU clusters on latency-sensitive inference without proprietary interconnects, and sell a more open rack-scale alternative.
AI servers are the new battleground, but the real fight is not just raw speed. It is how much glue you need around the chips to turn speed into a usable product. That is the gap Tenstorrent is trying to hit. On April 28, 2026, the company said Galaxy Blackhole is now generally available as a production AI server and as a larger supercluster system. (tenstorrent.com) ### What actually launched? Galaxy Blackhole is a 6U rack server built around Tenstorrent’s Blackhole accelerator, the company’s current AI chip. One server uses 32 Blackhole ASICs, and Tenstorrent is also selling a four-server “supercluster” as the next step up. The company lists the base Galaxy Blackhole system at $110,000 and the four-system Blackhole supercluster at $440,000. (tenstorrent.com) ### What is inside the box? The hardware is dense. Tenstorrent says the server delivers 23 PFLOPS of Block FP8 compute, 6.2 GB of on-accelerator SRAM at 2.9 PB/s, and 1 TB of GDDR6 at 16 TB/s. Each Blackhole ASIC has 10 × 400 GbE links, and the full system scales out through up to 56 × 800 GbE ports. The host side is an AMD EPYC 9004 CPU with up to 576 GB of DDR5 memory. Average power dra(tenstorrent.com)up. (tenstorrent.com) ### Why is Tenstorrent making such a big deal about networking? Because this is the whole pitch. Most AI systems still feel like accelerators bolted into a larger pile of networking, memory tiers, and software workarounds. Tenstorrent is arguing that compute, memory, and networking should be treated as one system from the start. Its phrase for that is “Networked AI” — basically, the interconnect is part of the machine, not an afterthought. (tenstorrent.com) ### Is this for training or inference? Mostly inference — especially the expensive, latency-sensitive kind. Tenstorrent is pushing Galaxy Blackhole for large-context LLM serving, agentic workflows, real-time systems, private AI infrastructure, and video generation. The company says the same hardware can handle both prefill and decode, which matters because many AI deployments end up optimizing one phase and compromising the other. (tenstorrent.com) ### What performance is it claiming? The headline claim is Blitz Mode on a four-system Galaxy supercluster. Tenstorrent says that setup reaches 350+ tokens per second per user on DeepSeek-R1-0528 671B, with sub-4-second time to first token at 100,000-token context. On the same page, Tenstorrent compares that with a “top 5” Nvid(tenstorrent.com)right read is not “case closed.” It is that Tenstorrent thinks long-context inference is where its architecture has the clearest advantage. (tenstorrent.com) ### What is the software angle? Open source, or at least much more open than the usual AI stack. Tenstorrent says Galaxy integrates through TT-Forge and TT-Lang, and it keeps emphasizing that 90% of Hugging Face models “just work.” The subtext is obvious — customers are tired of buying into closed hardware plus proprietary networking plus proprietary software all at once. (tenstorrent.com) ### Why does this matter beyond one server launch? Because Tenstorrent is trying to stop being “interesting chip company” and become “real alternative platform.” Selling $999 Blackhole PCIe cards is one thing. Shipping a production rack server with published pricing, scale-out configs, and benchmark claims is different. That is the move from dev kit to datacenter product. (tenstorrent.com) ### Bottom line? Galaxy Blackhole is Tenstorrent’s clearest attempt yet to turn Jim Keller-era architectural ambition into a buyable AI system. The promise is not just cheaper compute. It is fewer moving parts, faster inference, and less lock-in — if the real-world deployments hold up. (tenstorrent.com)