Qualcomm Hexagon NPU Achieves 40% Efficiency Gain
Qualcomm's latest Hexagon NPU demonstrates a 40% efficiency improvement over the previous generation. The architecture is noted for its ability to offload logic, which helps avoid thermal throttling in power-constrained edge devices.
- The Hexagon architecture has evolved from a digital signal processor (DSP) to a Neural Processing Unit (NPU) featuring a fused design that integrates scalar, vector, and tensor accelerators. This design utilizes a large, shared memory to facilitate high-speed data sharing between the different accelerators. - Qualcomm's latest Hexagon NPU 6, found in the Snapdragon X2 Elite Extreme, is rated for up to 80 Tera Operations Per Second (TOPS), a 78% increase over the 45 TOPS performance of the NPU in the preceding Snapdragon X Elite lineup. - The core of the Hexagon NPU is a Very Long Instruction Word (VLIW) processor that supports hardware-assisted multithreading, allowing it to execute multiple instruction streams simultaneously. The NPU 6 architecture features 12 threads for scalar processing and eight for vector processing. - In benchmark comparisons for laptop NPUs, the Hexagon NPU 6 has been shown to be up to 70% faster than the NPU in the Apple M4 and 82.4% faster than the one in the Intel Core Ultra 9 288V in specific tests. - The "offloading" of logic is a heterogeneous computing practice where AI and multimedia workloads are moved from the general-purpose CPU to the specialized, power-efficient NPU. This frees up the CPU for other tasks and reduces overall power consumption. - The NPU hardware is purpose-built to mimic the structure of neural network layers, accelerating common operations like convolutions, activation functions, and transformers. It supports mixed-precision computing, including INT4, INT8, INT16, and FP16 data types, to optimize performance and efficiency. - Beyond mobile phones, Hexagon NPUs are embedded in a wide array of systems, including automotive for advanced driver-assistance systems (ADAS), smart cameras for image processing, and various Internet of Things (IoT) and medical devices. - Developers can program and optimize for the Hexagon NPU using the Qualcomm Hexagon SDK, the Qualcomm AI Engine, or through higher-level environments like MATLAB and Simulink, which can generate optimized C code for the processor.