Alibaba's New Model Runs on Consumer GPUs

Alibaba just dropped an open-source model, Qwen 3.5-35B-A3B, that runs on consumer-grade 24GB GPUs. It reportedly beats GPT-5 mini and Claude Sonnet 4.5 on benchmarks, supports a 1M-token context, and has API costs ~60% lower than competitors, signaling a potential price collapse for frontier AI capabilities.

The architectural innovation behind Qwen3.5-35B-A3B is its use of a sparse Mixture-of-Experts (MoE) framework. This means that while the model contains 35 billion total parameters, only 3 billion are activated for any given token during inference. This design significantly reduces the computational load, allowing the model to deliver the knowledge capacity of a much larger model with the processing requirements of a smaller one. This efficiency is what enables the model to run on consumer-grade hardware. With 4-bit quantization, the Qwen3.5-35B-A3B model can fit within the 24GB of VRAM found on GPUs like the NVIDIA RTX 3090 or 4090. Community-reported benchmarks show impressive performance, with an RTX 3090 achieving around 110 tokens per second. The model natively supports a 262,144-token context window, which is already substantial for tasks requiring long-form content analysis. However, this can be extended to over 1 million tokens. The hosted version of the model, known as Qwen3.5-Flash, comes with a 1 million token context window by default. In terms of performance, benchmarks indicate that Qwen3.5-35B-A3B is competitive with or even surpasses other prominent models in its class. It has shown strong results on benchmarks for knowledge and visual reasoning, outperforming models like GPT-5 mini and Claude Sonnet 4.5 on specific evaluations such as MMMLU (Massive Multitask Language Understanding) and MMMU-Pro (a benchmark for multimodal understanding). The API pricing for the hosted Qwen3.5-Flash is set at approximately $0.10 per million input tokens and $0.40 per million output tokens. Other API providers offer the 35B-A3B model at rates around $0.25 per million input tokens and $2.00 for output. This pricing structure presents a significant cost reduction compared to competitors like Claude Sonnet 4.5, which has API pricing of $3 for input and $15 for output per million tokens.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.