New Chiplet Architecture Showcased at ISSCC
At the International Solid-State Circuits Conference, chip designer Rebellions presented its REBEL-Quad paper and demo. The design showcases an advanced chiplet architecture that fuses four separate dies into a single high-performance package, pointing to a more modular future for custom silicon.
- In internal benchmarks running the Llama 3.3 70B model, the REBEL-Quad demonstrated 1.6 times higher throughput while consuming 50% less power than top-tier GPUs, resulting in a 3.2x improvement in transactions per second per watt. - The architecture delivers up to 1,024 TFLOPS in 16-bit floating-point precision (FP16) and 2,048 TFLOPS in 8-bit precision (FP8), coupled with 144GB of HBM3E memory that provides 4.8 TB/s of bandwidth. - By connecting four separate dies, the design circumvents the physical manufacturing constraint known as the "reticle limit" (~858 mm²), which is the maximum size of a single chip, thereby improving production yields and cost-effectiveness over a single, larger monolithic chip. - It is the world's first AI accelerator to use the UCIe-Advanced (Universal Chiplet Interconnect Express) standard, an open interface that provides 1TB/s of bandwidth per channel for the dies to communicate. - A hardware-accelerated synchronization system coordinates execution across all chiplets, a feature designed to eliminate stalls when processing sparse and Mixture of Experts (MoE) models. - Rebellions is a South Korean startup founded in 2020, backed by strategic investors including Arm, Samsung, SK Hynix, and Korea Telecom, positioning it as a key player in the nation's AI semiconductor ecosystem. - The presentation at ISSCC, often called the "Semiconductor Olympics," included a live hardware demonstration, signaling a high level of maturity for the product as it moves toward mass production.