Karpathy: 'Agentic Engineering' Is New Paradigm
Andrej Karpathy described "agentic engineering" as the new programming paradigm, where developers' roles shift from writing code to managing autonomous agents that handle full setups. He argued this shift will amplify the value of deep expertise, as AI provides leverage for those who can effectively direct these systems. Karpathy also praised the accessibility of command-line interfaces (CLIs) for making agents more accessible to developers.
Karpathy's "agentic engineering" signals a shift from direct coding to orchestrating autonomous AI systems that manage entire development lifecycles. This evolution requires new methods for evaluating agentic systems beyond traditional benchmarks, focusing on multi-step reasoning, tool use, and error recovery. Specialized benchmarks like AgentBench and WebArena are emerging to test these complex behaviors in realistic scenarios. The transition to agentic systems intensifies the need for high-quality data to guide their behavior. Reinforcement Learning from Human Feedback (RLHF) is a critical process where human evaluators rank model outputs to teach nuanced, desired behaviors. This creates a demand for structured, domain-specific feedback on tasks like response helpfulness, safety, and factual accuracy, often requiring multi-pass reviews and calibration to ensure consistency. The quality of this human-labeled data directly impacts the performance and alignment of the final AI model. To reduce reliance on extensive human labeling, techniques like Constitutional AI (CAI) are being implemented. Developed by Anthropic, CAI trains models using a predefined set of principles, or a "constitution," allowing the AI to critique and revise its own responses to align with these rules. This approach aims to make the alignment process more scalable, transparent, and less subjective than relying solely on human feedback loops. The choice between human-labeled and synthetic data is a crucial strategic decision for AI labs. While synthetic data offers scalability and can address privacy concerns, it often lacks the nuance and ability to handle real-world complexity that human annotation provides. Hybrid approaches are common, using synthetic data for broad coverage and human validation for critical edge cases and refining subjective qualities like tone and empathy. Models trained on human-labeled data have been shown to outperform synthetic-trained counterparts by 12-18% on complex reasoning tasks. The fundraising landscape for AI infrastructure is robust, with a significant concentration of capital flowing into this sector. In 2025, AI-related companies captured nearly half of all global venture funding, a dramatic increase from the previous year. This investment is heavily focused on the foundational layers of AI, including data centers and compute power, reflecting a market shift towards securing the essential resources for building and training large-scale models. Go-to-market strategies for AI infrastructure startups are shifting away from traditional SaaS models toward a focus on demonstrating tangible ROI and building trust with highly technical buyers. Success metrics are evolving to include AI-specific KPIs like Return on AI Investment (ROAI), which measures revenue generated from automated workflows against the cost of model inference and compute. This requires a deep understanding of the customer's technical stack and the ability to prove how the infrastructure investment translates to measurable performance improvements. The rise of agentic systems and the increasing need for specialized data is reshaping the data labeling workforce. The demand is shifting from low-cost, high-volume taskers to domain experts like doctors, lawyers, and coders who can provide nuanced, context-rich feedback. This creates a new set of operational challenges in recruiting, managing, and ensuring the quality of this specialized, high-cost workforce. This evolution points to a future where human expertise collaborates with AI to build more capable and reliable systems.