Ambitious new AI chip
A new chip circulating in industry posts claims TSMC’s 5nm process, SK Hynix HBM3, and a design hitting roughly 512 TFLOPS with about 1.5 TB/s of memory bandwidth at ~180W TDP — numbers pitched as serving 7.4× more AI users and a 40% lower total cost of ownership versus Nvidia. (x.com) If that spec sheet is real it’s aimed directly at server and edge AI workloads that need dense compute-per-watt and high-bandwidth memory, which is why the post compares TCO head-to-head with established GPU vendors. (x.com)
An artificial intelligence chip has two jobs at once: do huge amounts of math and keep those math units fed with data. If either side falls behind, the chip sits idle like a race car stuck behind a fuel truck. (furiosa.ai) That is why high bandwidth memory keeps showing up in artificial intelligence hardware news. SK hynix says high bandwidth memory stacks multiple dynamic random access memory chips vertically so the processor can move far more data per second than conventional memory. (news.skhynix.com) The chip in this story appears to be FuriosaAI’s RNGD, pronounced “Renegade,” a second-generation data center accelerator that the company now says has entered mass production. FuriosaAI said on January 27, 2026 that it had received its first 4,000 units from Taiwan Semiconductor Manufacturing Company and Asus. (furiosa.ai) FuriosaAI’s own developer documentation matches the circulating spec sheet on the core hardware points. The company says RNGD uses Taiwan Semiconductor Manufacturing Company’s 5 nanometer process, two high bandwidth memory 3 modules, and 1.5 terabytes per second of memory bandwidth. (developer.furiosa.ai) The compute figure in FuriosaAI’s published materials is 512 trillion integer 8 operations per second, not 512 floating point operations per second. FuriosaAI also says the chip supports floating point 8, integer 8, and integer 4 formats, which matters because artificial intelligence inference often uses lower-precision math to save power and cost. (developer.furiosa.ai; furiosa.ai) Power is the other big claim. FuriosaAI says the PCI Express card version runs at a strict 180 watt thermal design power, and its eight-chip server draws about 3 kilowatts, which is low enough for standard air-cooled racks instead of the denser cooling setups many top-end graphics processing units now need. (furiosa.ai) The reason the company keeps talking about racks instead of raw chip speed is that many enterprise data centers were built around about 15 kilowatts per rack. FuriosaAI says power-hungry legacy graphics processing units often use 600 watts or more per chip, which can force operators into expensive electrical and cooling upgrades before they deploy more artificial intelligence. (furiosa.ai) FuriosaAI says its answer is a different kind of compute engine called a Tensor Contraction Processor. In plain English, that means the chip is built around the large tensor operations used in deep learning, instead of leaning on the fixed matrix-multiply building blocks common in many commercial accelerators. (furiosa.ai) That design is aimed at inference, which is the stage where a trained model answers user requests rather than learning from new data. FuriosaAI says RNGD is built for large language models, multimodal models, and cloud deployments, and can be split into 2, 4, or 8 isolated instances for shared use in Kubernetes environments. (developer.furiosa.ai) The most aggressive business claim is not the silicon spec but the comparison with Nvidia. FuriosaAI’s product page says its Nvidia comparisons come from internal measurements and engineering calculations, while a recent Chosun report said FuriosaAI presented data claiming up to 7.4 times more simultaneous users than Nvidia’s RTX Pro 6000 at the same power and about 40 percent lower total cost of ownership. (furiosa.ai; chosun.com) Some of the memory story sits outside FuriosaAI itself. SK hynix and Taiwan Semiconductor Manufacturing Company announced in April 2024 that they were working together on next-generation high bandwidth memory integration, and SK hynix later said its 12-layer high bandwidth memory 4 could process more than 2 terabytes per second, which shows how central memory packaging has become in the artificial intelligence chip race. (trendforce.com; news.skhynix.com) So the real bet here is simple: pack enough specialized math, enough fast memory, and enough efficiency into a 180 watt card that companies can install now, in the servers they already own. FuriosaAI is no longer pitching a lab prototype; as of January 2026 it is pitching a shipping product aimed straight at the part of the market where electricity, cooling, and rack space decide who wins. (furiosa.ai)