New 'Harness Engineering' Playbook Emerges for AI Agents
An industry playbook reviews how leading companies like OpenAI and Stripe are converging on a practice called “harness engineering” for building with AI agents. The approach combines agentic AI with robust orchestration, observability, and developer tooling. Success reportedly hinges on explicit I/O contracts for agents, ML-driven performance monitoring, and a seamless developer experience.
- A key principle of "harness engineering" is making the codebase and its architecture legible to AI agents, as anything they can't access and reason about effectively doesn't exist for them. This shifts the engineer's role from writing code to designing the environment, setting constraints, and curating the data and documentation the agent uses. OpenAI's internal experiment with this approach resulted in a product with a million lines of code written by their Codex agent, which they estimated was built in one-tenth of the time it would have taken manually. - For platform teams, harness engineering requires a shift in thinking about API design and developer experience. Instead of just human-centric design, APIs and platforms must be "AI-friendly," prioritizing clear structures and standardized patterns that agents can easily interpret and use. This may lead to a convergence on a smaller number of well-harnessed tech stacks and architectural patterns that are easier for AI to maintain. - A formal "Agent Contract" framework is emerging to govern AI agent behavior, defining preconditions, postconditions, and operational constraints. This concept extends to resource management, with contracts specifying and enforcing limits on things like token usage and execution time to ensure predictable and auditable performance. For external-facing APIs, this could evolve into programmatic service-level agreements enforced by the harness. - The rise of agentic AI is creating new investment opportunities and market dynamics. Venture capitalists are increasingly using AI agents for market analysis, financial risk assessment, and even evaluating founding teams. Meanwhile, the software industry is seeing a stock market revaluation as investors grapple with the disruptive potential of AI agents becoming the primary interface for enterprise workflows, potentially relegating traditional software to the role of a passive data store. - Stripe and OpenAI are collaborating on the Agentic Commerce Protocol (ACP), an open standard to allow AI agents like ChatGPT to programmatically handle purchases from merchants. This creates a new, conversational commerce channel where the AI agent facilitates the entire transaction, from product discovery to payment, without the user needing to visit a traditional website or app. For developers, this signals a need to prepare for a future where their APIs must securely and reliably interact with autonomous financial agents. - Observability tooling is adapting to the unique challenges of monitoring AI agents. Platforms like Langfuse, Arize, and Galileo are offering specialized capabilities for tracing agent decision-making, monitoring for hallucinations, and tracking metrics like token usage and latency. The industry is starting to converge on OpenTelemetry as a standard for collecting this data, which will be crucial for debugging and optimizing complex, multi-agent systems. - From a leadership perspective, building AI-enabled product teams requires a cross-functional approach that includes roles like AI Product Managers, AI Ethicists, and Data Scientists embedded within traditional product teams. The focus of these teams is not just on building models, but on defining the business problems AI can solve, ensuring data quality, and establishing measurable goals and KPIs to gauge the impact of AI initiatives. - The adoption of agentic AI is maturing from analytics and recommendations to autonomous execution in enterprise settings. Companies in sectors like manufacturing and logistics are deploying agents to automate tasks such as predictive maintenance scheduling and real-time supplier contract renegotiation, shifting the role of human operators from manual execution to monitoring the autonomous systems.