Z.ai Releases Open-Weights LLM for Agentic Planning
Z.ai has released GLM-5, an open-weights large language model designed for long-horizon agentic planning and systems engineering. Now deployable on the Modal cloud platform, GLM-5 is positioned as a customizable, self-hostable alternative to closed models. The model is intended for applications requiring persistent memory and multi-stage reasoning, such as complex trading or compliance workflows.
- GLM-5 is a Mixture-of-Experts (MoE) model with 744 billion total parameters, of which 40 billion are active during inference. This architecture, combined with a training data increase to 28.5T tokens, aims to enhance performance on complex, multi-step tasks. The model is released under a permissive MIT license, allowing for commercial use and modification. - For deployment, GLM-5 integrates DeepSeek Sparse Attention (DSA), a mechanism designed to reduce the computational cost of processing long contexts—supporting up to 200,000 tokens—which is critical for analyzing extensive financial reports or backtesting trading strategies over long time horizons. The model's total size in BF16 precision is approximately 1.5TB. - In agentic planning benchmarks, GLM-5 ranks first among open-source models on Vending Bench 2, a simulation requiring long-term resource management, finishing with a final account balance of $4,432. This suggests an aptitude for tasks like automated portfolio rebalancing or managing compliance workflows over extended periods. - A key technical detail is the use of a novel asynchronous reinforcement learning infrastructure called "slime," developed by Z.ai to improve training efficiency. This allows for more rapid iteration and fine-tuning on specialized financial datasets. - The model was notably trained entirely on Huawei Ascend chips, signifying a move toward AI infrastructure independence from hardware providers like NVIDIA. - On the SWE-bench Verified coding benchmark, GLM-5 achieves a score of 77.8%, outperforming some proprietary models like Gemini 3.0 Pro and approaching the performance of Claude Opus 4.5. This is relevant for developing and debugging complex trading algorithms and financial data pipelines. - API pricing for GLM-5 is positioned to be highly competitive, with initial rates on platforms like OpenRouter at approximately $0.80 per million input tokens and $2.56 for output, roughly six times cheaper than comparable closed models. However, it is more expensive than other open-weight models like DeepSeek V3.2 Speciale. - The release is part of a broader competitive trend among Chinese AI labs, with companies like ByteDance and Moonshot AI also releasing significant model upgrades, signaling an acceleration in the development of capable, open-source alternatives to major US-based models.