Qwen3.6 checkpoint release
Red Hat published a quantized checkpoint for Alibaba’s Qwen3.6-35B-A3B sparse MoE model and reports near-perfect recovery on benchmarks such as GSM8K. (x.com) The checkpoint is positioned as a practical step for running large sparse models in resource-constrained settings. (x.com)
Red Hat has published a quantized version of Alibaba’s new Qwen3.6-35B-A3B model, aiming to keep most of the original accuracy while cutting deployment costs. (huggingface.co) Large language models store their behavior in billions of numeric weights, and quantization shrinks those numbers into lower-precision formats so the model uses less memory and runs faster. Red Hat’s release uses NVFP4, a 4-bit floating-point format, on top of Qwen’s 35 billion-parameter sparse mixture-of-experts model. (huggingface.co 1) (huggingface.co 2) Qwen3.6-35B-A3B itself was released by the Qwen team on April 14, 2026 as the first open-weight variant in the Qwen3.6 line. The model has 35 billion total parameters but activates about 3 billion per token, which is the “A3B” in its name. (qwen.ai) (huggingface.co) That sparse design works like a panel of specialists: the full model is large, but only a small subset of experts is used for each token. Qwen says the model has 256 experts and activates 8 routed experts plus 1 shared expert at a time, with a native context window of 262,144 tokens. (huggingface.co) Red Hat’s model card reports “preliminary evaluations” on GSM8K Platinum, a math benchmark, with 96.28% accuracy for the quantized checkpoint versus 95.62% for the base Qwen release. The card labels that as 100.69% recovery and says more rigorous evaluations are still in progress. (huggingface.co) The release is pitched at a moment when developers are trying to run stronger open models on a single server or a smaller GPU budget instead of full multi-node clusters. Red Hat’s page includes a vLLM serve command and specifies a mixture-of-experts backend, signaling that the checkpoint is meant for practical inference rather than just benchmark demos. (huggingface.co) Alibaba’s own numbers position Qwen3.6-35B-A3B as a coding-focused model that competes above its active size. In Qwen’s published results, it scores 73.4 on SWE-bench Verified and 51.5 on Terminal-Bench 2.0, up from 70.0 and 40.5 for Qwen3.5-35B-A3B. (qwen.ai) (huggingface.co) Qwen also says the model was built around “stability and real-world utility” after feedback from developers, with improvements in repository-level coding and a feature it calls thinking preservation across conversation history. The official GitHub repository was updated on April 18, 2026 to add Qwen3.6-35B-A3B materials and notes that the weights became available on April 16. (huggingface.co) (github.com) The open question is how well Red Hat’s near-lossless math result holds across coding, tool use, and longer agent workflows. For now, the checkpoint gives developers a new way to test whether a 35B sparse model can fit into tighter hardware limits without giving up much of what made the original release attractive. (huggingface.co) (qwen.ai)