AMD GPUs Add Support for Qwen 3.5 LLM
AMD announced day-zero support for Alibaba’s Qwen 3.5 large language model on its MI300X, MI325X, and MI35X Instinct GPUs. The integration leverages AMD's optimized ROCm software stack and vLLM inference serving for high-performance AI workloads.
- The Qwen family of large language models is developed by Alibaba Cloud and includes a range of open-weight models with varying parameter sizes, from 0.5 billion to 110 billion. These models are built on the Transformer architecture and support a context length of up to 32,768 tokens. - AMD's Instinct MI300X accelerator is designed for large-scale AI and high-performance computing (HPC) workloads, featuring 192 GB of high-bandwidth HBM3 memory with a peak theoretical bandwidth of 5.3 TB/s. This large memory capacity is critical for running inference on large parameter models like Qwen without requiring complex model parallelism across multiple GPUs. - The ROCm (Radeon Open Compute) platform is AMD's open-source software stack for GPU programming, serving as an alternative to NVIDIA's proprietary CUDA ecosystem. It provides the drivers, compilers, and libraries needed for AI frameworks like PyTorch and TensorFlow to run on AMD hardware. - vLLM is an open-source inference and serving engine designed for high-throughput and memory-efficient performance. It utilizes techniques like PagedAttention and continuous batching to optimize the serving of LLMs and supports a variety of hardware backends, including both AMD and NVIDIA GPUs. - While NVIDIA's CUDA has historically been the dominant platform for AI with a more mature ecosystem, ROCm has become increasingly competitive, particularly in memory-bound workloads. The open-source nature of ROCm offers greater flexibility and helps avoid vendor lock-in, a key consideration for specialized, long-lifecycle systems. - The Qwen model series includes specialized versions for different tasks, such as Qwen-VL for vision-language, Qwen-Audio, and Qwen-Math, demonstrating its adaptability for various applications beyond text generation. - The AMD CDNA 3 architecture, which powers the MI300 series, includes support for a range of data precisions, from FP64 for HPC to more efficient AI-focused formats like FP8 and INT8 with sparsity support.