Agents: engineering, not just models

Enterprise AI agents have shifted from splashy demos to hard engineering problems — orchestration, permissions, logging and fail‑safe behaviour now determine success. A large code-analysis video argues that teams will win by building the control plane around models (auditability, scoped access, human checkpoints) rather than by chasing the single best frontier model. (youtube.com)

A model is the part that writes the sentence. An agent is the part that can also open a file, call a tool, click a button, or send a message into another system, which turns a chatbot into something closer to a junior operator with a keyboard. (developers.openai.com) That extra power creates a very old software problem in a very new wrapper. Once an agent can touch customer data, internal code, or a payment system, the hard question stops being “which model is smartest” and becomes “who let this thing do that, and where is the log.” (microsoft.github.io) Anthropic drew a clean line between a workflow and an agent in December 2024. A workflow follows a preset path written in code, while an agent decides its own next step on the fly, which is more flexible and much harder to predict. (anthropic.com) The same Anthropic post said most successful teams were not winning with giant, magical frameworks. They were using simple patterns, adding complexity only when needed, and often finding that one well-designed model call beat a sprawling autonomous system. (anthropic.com) OpenAI’s Agents software development kit now sells exactly that middle layer. Its docs describe tool use, handoffs between specialized agents, streaming, and “a full trace of what happened,” which is software language for “you need a flight recorder before you put this in production.” (developers.openai.com) Anthropic’s own platform docs make the same shift visible from another angle. The basic application programming interface says you manage every turn and write your own tool loop, while the higher-level agent kit adds built-in file, shell, and web tools, which means the platform is moving from text generation toward supervised action. (platform.claude.com) Microsoft is now selling the missing enterprise piece almost by name. Its Foundry Control Plane, announced on November 18, 2025, says the difficult part is not building an agent in minutes but understanding what actions agents are taking, how they are performing, and whether they comply with policy. (techcommunity.microsoft.com) Microsoft’s product page breaks that control layer into four buckets: controls, observability, security, and fleet-wide operations. It also added checks on tool calls and tool responses, which is the important detail, because the dangerous moment is often not the model’s wording but the external action it tries to take. (techcommunity.microsoft.com) GitHub moved the same way in October 2025. Its enterprise “agent control plane” added a single admin view, session tracking for the last 24 hours, audit logs that mark when an action was executed by an agent, and fine-grained permissions for AI administration. (github.blog) The security playbooks are getting weirdly specific because the failure modes are weirdly specific. Microsoft’s multi-agent reference architecture calls for role-based access control, approved capability lists, signed identities, centralized logs with input and output hashes, versioning, rollback, and a manual override that can pause an agent class without shutting down the whole system. (microsoft.github.io) That is why the newest argument in the agent market is drifting away from benchmark scores. The winning stack increasingly looks like a strong-enough model wrapped in scoped access, human checkpoints, audit trails, and a kill switch, because companies buy software that can be governed, not demos that can only impress. (developers.openai.com) (anthropic.com) (github.blog)

Agents: engineering, not just models

Get your own daily briefing