Alibaba's New Model Runs on Consumer GPUs
Alibaba just dropped an open-source model, Qwen 3.5-35B-A3B, that runs on consumer-grade 24GB GPUs. It reportedly beats GPT-5 mini and Claude Sonnet 4.5 on benchmarks, supports a 1M-token context, and has API costs ~60% lower than competitors, signaling a potential price collapse for frontier AI capabilities.
The architectural innovation behind Qwen3.5-35B-A3B is its use of a sparse Mixture-of-Experts (MoE) framework. This means that while the model contains 35 billion total parameters, only 3 billion are activated for any given token during inference. This design significantly reduces the computational load, allowing the model to deliver the knowledge capacity of a much larger model with the processing requirements of a smaller one. This efficiency is what enables the model to run on consumer-grade hardware. With 4-bit quantization, the Qwen3.5-35B-A3B model can fit within the 24GB of VRAM found on GPUs like the NVIDIA RTX 3090 or 4090. Community-reported benchmarks show impressive performance, with an RTX 3090 achieving around 110 tokens per second. The model natively supports a 262,144-token context window, which is already substantial for tasks requiring long-form content analysis. However, this can be extended to over 1 million tokens. The hosted version of the model, known as Qwen3.5-Flash, comes with a 1 million token context window by default. In terms of performance, benchmarks indicate that Qwen3.5-35B-A3B is competitive with or even surpasses other prominent models in its class. It has shown strong results on benchmarks for knowledge and visual reasoning, outperforming models like GPT-5 mini and Claude Sonnet 4.5 on specific evaluations such as MMMLU (Massive Multitask Language Understanding) and MMMU-Pro (a benchmark for multimodal understanding). The API pricing for the hosted Qwen3.5-Flash is set at approximately $0.10 per million input tokens and $0.40 per million output tokens. Other API providers offer the 35B-A3B model at rates around $0.25 per million input tokens and $2.00 for output. This pricing structure presents a significant cost reduction compared to competitors like Claude Sonnet 4.5, which has API pricing of $3 for input and $15 for output per million tokens.