Zhipu AI's GLM-5 Hits 77.8% Benchmark

RobbiewOnline highlighted Zhipu AI's GLM-5 achieving a 77.8% SWE-bench score with 44B active parameters in a 744B MoE architecture, 200K context window, and MIT license. The model represents a significant leap in open-source AI capabilities for software engineering tasks.

The GLM-5's 77.8% score was achieved on SWE-bench Verified, a subset of the full benchmark containing 500 tasks that have been confirmed by human software engineers to be solvable. This specific benchmark is designed to test an AI's ability to resolve real-world software issues sourced directly from GitHub repositories. The evaluation process involves the model generating a code patch to fix an issue, which is then applied and tested within a containerized environment to ensure reproducibility. Zhipu AI, a 2019 spin-off from Tsinghua University, developed GLM-5. The model's Mixture-of-Experts (MoE) architecture is a key factor in its performance, allowing for a massive total parameter count of 744 billion while only activating a fraction—around 40 to 44 billion—for any given task. This sparse activation approach enables high performance without the proportional computational cost of a dense model. The model's 200,000-token context window is another critical feature, allowing it to process and "remember" large amounts of information, equivalent to entire codebases or extensive documentation. This is facilitated by the use of DeepSeek Sparse Attention, a mechanism designed for efficient handling of long data sequences. This large context is essential for tackling complex, multi-file software engineering problems. The MIT license under which GLM-5 is released is significant for the open-source community, as it permits broad use, including for commercial applications. This contrasts with more restrictive licenses and encourages wider adoption and development by researchers and enterprises who can build upon the model's capabilities without hefty subscription fees. Zhipu AI has a history of releasing open-weight models, with GLM-5 being the latest in their "General Language Model" series.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.