Essay: AI Orchestrators Should Never Write Code
A new technical essay argues that AI orchestrators should never generate their own code on the fly. The author warns this practice introduces hidden risks and opaque failure modes, advocating instead for explicit, human-audited, version-controlled configurations to ensure reliability. The piece suggests a clear separation between agent planning and code execution is critical for production systems.
The debate over AI code generation ties into a broader architectural decision: how to orchestrate multiple specialized AI agents. Frameworks like LangGraph, AutoGen, and CrewAI offer different patterns for this coordination, from graph-based state management to conversational models where agents interact in a loop. The choice of framework dictates how control, data, and tasks flow between agents, fundamentally shaping the system's reliability. Multi-agent systems are often designed using hierarchical, decentralized, or hybrid architectural patterns. In a hierarchical model, a primary agent delegates sub-tasks to specialized agents, a pattern supported by frameworks like Google's Agent Development Kit (ADK). This "Agents as Tools" approach creates a clear separation of concerns, where an orchestrator agent routes requests to the appropriate specialist, simplifying debugging and modularity. Recent research from Anthropic demonstrated that a multi-agent architecture significantly outperforms a single, more powerful model. A system using Claude Opus 4 as the lead agent delegating to smaller Claude Sonnet 4 sub-agents outperformed a lone Claude Opus 4 by over 90% on research evaluations, showcasing the power of distributed, specialized reasoning. This highlights a key trade-off: single agents are simpler to build, but multi-agent systems excel at managing extensive domain knowledge and complex tasks. The core challenge in agent orchestration is long-horizon reasoning—decomposing complex goals into manageable steps. Research in this area focuses on hierarchical planning, where agents break down large tasks into structured subgoals, mirroring human cognitive strategies. Frameworks are evolving beyond simple "Reason and Act" (ReAct) loops to more sophisticated "Pre-Act" models that generate a multi-step execution plan before taking action. This shift towards structured, multi-agent systems impacts engineering leadership. As individual engineer productivity increases with AI tools, the focus for CTOs shifts from scaling headcount to architecting systems that create leverage. The key challenge becomes designing the "glue" that connects specialized components and ensuring reliability, rather than simply building the components themselves. For consumer-facing products, the complexity of multi-agent systems must be abstracted away from the user. While engineering focuses on robust orchestration, product design must make the agent's behavior feel simple and intuitive. This is critical as a significant portion of web traffic, in some cases 20-30%, is already composed of bots and agents, reshaping the user experience landscape. The risk of dynamic code generation is amplified by adversarial attacks, where malicious inputs are designed to deceive AI models and exploit vulnerabilities. These attacks can manipulate a model's output to generate insecure code, leak sensitive data, or perform unintended actions. This makes human-in-the-loop auditing and version-controlled configurations a critical security measure against threats like prompt injection and data poisoning. In China, the regulatory landscape for AI is maturing rapidly, with a focus on balancing innovation with risk management. Regulations like the "Interim Measures for the Administration of Generative AI Services" require algorithm filing and security assessments, impacting how AI agent services can be deployed. For companies like Pyra, staying compliant with these evolving rules, including explicit labeling of AI-generated content, is a core operational requirement.