Zhipu AI's GLM-5 model leads coding benchmark
Zhipu AI's latest model, GLM-5, is a 745-billion parameter Mixture-of-Experts (MoE) model that has reportedly achieved dominant performance on the SWE-Bench coding benchmark. The model's architecture leverages MoE for efficient scaling and is part of a growing arms race in open-source foundation models, particularly those optimized for code generation and agentic automation.
- The 77.8% score on the SWE-Bench Verified dataset places GLM-5's performance close to leading proprietary models like Claude Opus 4.6 (which scores between 79.4% and 80.9%) and GPT-5.3 (78.2%). - A key strategic detail is that the model was trained entirely on Huawei Ascend chips, demonstrating a significant step towards China's hardware independence in developing frontier AI systems. - Zhipu AI, the company behind the model, was founded in 2019 by researchers from Tsinghua University and recently became the first major Chinese generative AI firm to go public with a $558 million IPO in Hong Kong. - The model's architecture uses a sparse attention mechanism from DeepSeek to manage its 200,000-token context window and employs 256 experts, activating the top 8 for each token during inference. - Beyond SWE-Bench, GLM-5 also scored 56.2% on Terminal-Bench 2.0, a benchmark focused on terminal-based tasks, and ranked first among open-source models on Vending Bench 2, which evaluates long-term operational capabilities. - Independent testing has highlighted potential discrepancies between benchmark results and real-world application; one analysis replicated the Terminal-Bench 2.0 score at 40.4% instead of the official 56.2%, attributing the gap to the official benchmark's lack of real-world time limits. - The model is available under a permissive MIT license, and its API is priced to be highly competitive, costing around $0.11 per million tokens compared to approximately $5 per million tokens for a model like Claude Opus 4.6.