Workforce Shifting from 'Doing' to 'Judging' AI Output

Brent Orrell of the American Enterprise Institute argues that the economy is transitioning from workers creating products from scratch to assembling and verifying AI-generated content. He notes that while AI errors are often less frequent than human errors, user trust in AI output remains low. This shift positions human-in-the-loop validation and judgment as a critical new skill set.

- Reinforcement Learning from Human Feedback (RLHF) is a multi-stage process that begins with a pre-trained model, collects human preference data on model outputs, trains a "reward model" to predict those preferences, and then fine-tunes the original model to maximize the reward signal. This has led to a shift away from large-scale, low-skill data labeling toward a demand for high-quality, domain-expert annotators in fields like law, medicine, and software engineering to ensure nuanced and accurate feedback. - Constitutional AI, a technique developed by Anthropic, reduces the dependency on constant human supervision by training a model based on a set of explicit principles (a "constitution"). This process involves the model critiquing and revising its own outputs based on these principles, a method known as Reinforcement Learning from AI Feedback (RLAIF), which makes the alignment process more scalable and transparent than traditional RLHF. - The rise of agentic AI systems, which can execute multi-step tasks and use tools, has created a need for new evaluation benchmarks beyond traditional LLM metrics. Frameworks like AgentBench, WebArena, and GAIA are now used to test agent capabilities in areas like web navigation, task completion, and tool usage, measuring metrics such as success rate, cost per task, and error handling. - While synthetic data can be generated up to 50 times faster than human labeling and avoids privacy issues, it can fall short in accuracy by up to 35% for tasks requiring contextual understanding. As a result, many AI labs adopt a hybrid approach, using synthetic data for scale and then fine-tuning models with smaller, high-quality, human-labeled datasets to capture nuance and handle edge cases. - Go-to-market strategies for AI infrastructure startups are shifting to focus on selling value over technology; for instance, messaging "cut debugging time by 40%" is more effective than "LLM-powered root cause analysis." The buying process is no longer limited to technical decision-makers, requiring sales teams to engage a complex committee of stakeholders that includes innovation leads, internal community champions, and department heads. - The fundraising climate for AI infrastructure remains exceptionally strong, with AI startups capturing nearly 50% of all global venture funding in 2025, totaling $202.3 billion. Foundation model developers like OpenAI and Anthropic raised $80 billion of that total, signaling massive investor confidence in the core technology stack. - The World Economic Forum projects that while AI may displace 75 million jobs globally, it is expected to create 133 million new roles, resulting in a net increase of 58 million jobs. This transition means nearly 40% of all global jobs are exposed to AI-driven change, requiring a significant reskilling of the workforce toward cognitive, creative, and technical skills that complement AI systems.

Workforce Shifting from 'Doing' to 'Judging' AI Output

Get your own daily briefing