Qualcomm Hexagon NPU Achieves 40% Efficiency Gain

Published by The Daily Scout

What happened

Qualcomm's latest Hexagon NPU demonstrates a 40% efficiency improvement over the previous generation. The architecture is noted for its ability to offload logic, which helps avoid thermal throttling in power-constrained edge devices.

Why it matters

- The Hexagon architecture has evolved from a digital signal processor (DSP) to a Neural Processing Unit (NPU) featuring a fused design that integrates scalar, vector, and tensor accelerators. This design utilizes a large, shared memory to facilitate high-speed data sharing between the different accelerators. - Qualcomm's latest Hexagon NPU 6, found in the Snapdragon X2 Elite Extreme, is rated for up to 80 Tera Operations Per Second (TOPS), a 78% increase over the 45 TOPS performance of the NPU in the preceding Snapdragon X Elite lineup. - The core of the Hexagon NPU is a Very Long Instruction Word (VLIW) processor that supports hardware-assisted multithreading, allowing it to execute multiple instruction streams simultaneously. The NPU 6 architecture features 12 threads for scalar processing and eight for vector processing. - In benchmark comparisons for laptop NPUs, the Hexagon NPU 6 has been shown to be up to 70% faster than the NPU in the Apple M4 and 82.4% faster than the one in the Intel Core Ultra 9 288V in specific tests. - The "offloading" of logic is a heterogeneous computing practice where AI and multimedia workloads are moved from the general-purpose CPU to the specialized, power-efficient NPU. This frees up the CPU for other tasks and reduces overall power consumption. - The NPU hardware is purpose-built to mimic the structure of neural network layers, accelerating common operations like convolutions, activation functions, and transformers. It supports mixed-precision computing, including INT4, INT8, INT16, and FP16 data types, to optimize performance and efficiency. - Beyond mobile phones, Hexagon NPUs are embedded in a wide array of systems, including automotive for advanced driver-assistance systems (ADAS), smart cameras for image processing, and various Internet of Things (IoT) and medical devices. - Developers can program and optimize for the Hexagon NPU using the Qualcomm Hexagon SDK, the Qualcomm AI Engine, or through higher-level environments like MATLAB and Simulink, which can generate optimized C code for the processor.

Key numbers

  • Qualcomm's latest Hexagon NPU demonstrates a 40% efficiency improvement over the previous generation.
  • Qualcomm's latest Hexagon NPU 6, found in the Snapdragon X2 Elite Extreme, is rated for up to 80 Tera Operations Per Second (TOPS), a 78% increase over the 45 TOPS performance of the NPU in the preceding Snapdragon X Elite lineup.
  • The NPU 6 architecture features 12 threads for scalar processing and eight for vector processing.
  • In benchmark comparisons for laptop NPUs, the Hexagon NPU 6 has been shown to be up to 70% faster than the NPU in the Apple M4 and 82.4% faster than the one in the Intel Core Ultra 9 288V in specific tests.

Quick answers

What happened in Qualcomm Hexagon NPU Achieves 40% Efficiency Gain?

Qualcomm's latest Hexagon NPU demonstrates a 40% efficiency improvement over the previous generation. The architecture is noted for its ability to offload logic, which helps avoid thermal throttling in power-constrained edge devices.

Why does Qualcomm Hexagon NPU Achieves 40% Efficiency Gain matter?

The Hexagon architecture has evolved from a digital signal processor (DSP) to a Neural Processing Unit (NPU) featuring a fused design that integrates scalar, vector, and tensor accelerators. This design utilizes a large, shared memory to facilitate high-speed data sharing between the different accelerators. Qualcomm's latest Hexagon NPU 6, found in the Snapdragon X2 Elite Extreme, is rated for up to 80 Tera Operations Per Second (TOPS), a 78% increase over the 45 TOPS performance of the NPU in the preceding Snapdragon X Elite lineup. The core of the Hexagon NPU is a Very Long Instruction Word (VLIW) processor that supports hardware-assisted multithreading, allowing it to execute multiple instruction streams simultaneously. The NPU 6 architecture features 12 threads for scalar processing and eight for vector processing. In benchmark comparisons for laptop NPUs, the Hexagon NPU 6 has been shown to be up to 70% faster than the NPU in the Apple M4 and 82.4% faster than the one in the Intel Core Ultra 9 288V in specific tests. The "offloading" of logic is a heterogeneous computing practice where AI and multimedia workloads are moved from the general-purpose CPU to the specialized, power-efficient NPU. This frees up the CPU for other tasks and reduces overall power consumption. The NPU hardware is purpose-built to mimic the structure of neural network layers, accelerating common operations like convolutions, activation functions, and transformers. It supports mixed-precision computing, including INT4, INT8, INT16, and FP16 data types, to optimize performance and efficiency. Beyond mobile phones, Hexagon NPUs are embedded in a wide array of systems, including automotive for advanced driver-assistance systems (ADAS), smart cameras for image processing, and various Internet of Things (IoT) and medical devices. Developers can program and optimize for the Hexagon NPU using the Qualcomm Hexagon SDK, the Qualcomm AI Engine, or through higher-level environments like MATLAB and Simulink, which can generate optimized C code for the processor.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Published by The Daily Scout - Be the smartest in the room.