Builddy agents spin up full apps

Builddy demonstrated GLM-5.1 agents that can design, build and deploy complete web applications from prompts, showing an emerging class of AI tools that automate full-stack work. (x.com) The demo suggests these agents can cover system-design-to-deploy workflows, not just single-file code edits. (x.com)

Most coding assistants still work like autocomplete with a better memory: they fix one file, answer one question, or write one function, then wait for the next instruction. Builddy’s demo showed a different model of work, where one prompt kicks off a chain that plans an app, writes the code, reviews it, and deploys a live result. (github.com, z.ai) That shift depends on what software people call an agent: a system that can take a goal, use tools, check its own progress, and keep going without needing a human after every step. Z.ai says its GLM-5.1 model was built for these longer runs, with support for tool use, structured output, and sessions that can stay on one task for up to 8 hours. (z.ai) Long-horizon work is the hard part here. Writing a login page in 30 seconds is easy compared with keeping a coherent plan across architecture, interface design, bug fixing, and deployment, which is closer to building a house than replacing one brick. (z.ai, z.ai) Z.ai’s own launch materials for GLM-5.1 leaned on that exact point. The company said earlier models often improved quickly and then plateaued, while GLM-5.1 was trained to keep breaking problems into steps, run experiments, read results, and revise its strategy over hundreds of rounds and thousands of tool calls. (z.ai) Builddy is a small but concrete example of that idea. Its public repository describes a five-step pipeline — parse, plan, code, review, deploy — with a front end in Next.js, a back end in FastAPI, and a live app served after the model finishes the chain. (github.com) The important detail is that Builddy is not one giant model call. The repository says each build uses 4 distinct GLM reasoning steps, and each later modification uses another pass over the existing code before redeploying, which is much closer to a junior developer following a checklist than a chatbot spitting out a code block. (github.com) That matters because full-stack work usually breaks at the seams. A tool can generate a pretty front page, but the database schema, the application programming interface routes, the state management, and the deployment setup all have to agree with each other or the app falls apart the moment a real user clicks around. (github.com, z.ai) Z.ai is also trying to show that GLM-5.1 is stronger on benchmarks that resemble this kind of end-to-end engineering. In its April 2026 launch post, the company reported 58.4 on SWE-Bench Pro, 42.7 on NL2Repo for repository generation, and 63.5 on Terminal-Bench 2.0 for terminal tasks, all framed as tests of sustained software work rather than single answers. (z.ai) The company’s own examples go beyond website mockups. In one experiment, Z.ai said GLM-5.1 kept optimizing a vector database for more than 600 iterations and more than 6,000 tool calls, reaching 21.5 thousand queries per second after starting from a benchmark setup where 3,547 queries per second had been the best prior result under the same condition. (z.ai) So the news here is not that another model can make a landing page from a prompt. The new claim is that a model can stay coherent long enough to act like a tiny software team — first deciding what to build, then building it, then checking its own work, then shipping a live version without stopping after the first draft. (github.com, z.ai) That does not mean human developers disappear. Builddy’s own repository shows a controlled pipeline with simple web app outputs and visible stages, which is a long way from replacing engineers who handle security reviews, production outages, compliance, and months of maintenance after launch. (github.com) But it does mean the center of gravity is moving. The old question was whether an artificial intelligence model could write code; the new question is whether it can own a whole workflow from prompt to deployed app, and Builddy is one of the clearest demos so far that the answer is starting to become yes. (github.com, z.ai)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.