Open‑source models cut costs

A social post claimed open‑source models like GLM‑5.1 now match frontier models for many use cases and have reduced Lindy’s inference costs by about 2–5x. (x.com)

Running an artificial intelligence assistant costs money every time it reads a prompt and writes an answer, and Lindy’s founder said newer open-weight models could cut that bill by 2 to 5 times. (x.com) Flo Crivello wrote on April 14 that inference, the per-request compute cost of serving a model, is now Lindy’s biggest expense and exceeds payroll. He said open-source models were “not even close” last year, “almost there” three months ago, and now look usable for many production tasks. (x.com) One of the models in that shift is GLM-5.1, released in April 2026 by Z.ai. The company says it is built for long-running coding and agent workflows, with a 200,000-token context window and runs that can continue for up to 8 hours on a single task. (z.ai, docs.z.ai) Z.ai says GLM-5.1 scored 58.4 on SWE-Bench Pro, ahead of GLM-5 at 55.1 and slightly above Claude Opus 4.6 at 57.3 in its published comparison. Artificial Analysis lists GLM-5.1 as an open-weights model with a 51 score on its Intelligence Index and 754 billion total parameters, with 40 billion active per token. (z.ai, artificialanalysis.ai) The price gap is easier to measure than the quality gap. Z.ai lists GLM-5.1 at $1.40 per 1 million input tokens and $4.40 per 1 million output tokens on its own platform, while Lindy’s older public model guide from May 2025 still centered its recommendations on Claude, OpenAI, and Google models instead of open-weight alternatives. (docs.z.ai, lindy.ai) That change lands as inference prices across the industry have been falling fast for more than a year. Epoch AI reported in March 2025 that the cost to reach fixed performance thresholds had dropped by 9 times to 900 times per year, depending on the benchmark. (epoch.ai) Open-weight does not mean free in practice. GLM-5.1 can be downloaded under an Massachusetts Institute of Technology license on Hugging Face, but serving a 754 billion-parameter mixture-of-experts model still requires specialized hardware, and third-party providers often charge different rates from the model maker. (huggingface.co, artificialanalysis.ai, openrouter.ai) There is also a tradeoff inside the headline claim. Artificial Analysis says GLM-5.1 is “amongst the leading models in intelligence,” but also calls it slower than average and “very verbose,” which can raise total output-token costs even when sticker prices are lower. (artificialanalysis.ai) Lindy has not published a detailed breakdown showing which workloads would move to which model, so the 2 to 5 times savings figure remains the company founder’s estimate rather than an audited cost report. But the underlying inputs to that estimate — stronger open-weight benchmarks, lower token prices, and inference bills large enough to rival payroll — are now public. (x.com, docs.z.ai, z.ai)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.