Labs Now Sourcing Real-World Work Artifacts

In a sign of escalating data needs, OpenAI is now asking contractors to upload actual work artifacts from their previous jobs. This move signals a shift away from synthetic examples toward "in-the-wild" data to train and evaluate AI agents on diverse, authentic user workflows.

The push for "real-world" data is a direct response to the limitations of web-scraped and synthetic datasets, which often lack the messy, implicit context of genuine workplace tasks. OpenAI, in partnership with vendor Handshake AI, is asking contractors to upload files like presentations, spreadsheets, and code repositories, paired with the original request that prompted the work. This strategy aims to capture the complete lifecycle of a task—from instruction to execution—to better train AI agents for complex white-collar automation. This data is crucial for advancing Reinforcement Learning from Human Feedback (RLHF), a technique where models are optimized based on human preferences. In a typical RLHF workflow, a reward model is trained on human-ranked outputs, learning to assign higher scores to more helpful and harmless responses. Sourcing data from actual work artifacts provides a richer, more nuanced signal for these reward models than generic or synthetic examples. However, RLHF's reliance on human labelers can be a bottleneck. In response, labs like Anthropic are pioneering Constitutional AI (CAI), where a model uses a predefined set of principles—a "constitution"—to critique and revise its own outputs, reducing the dependence on human feedback for every single data point. This self-correction mechanism, known as Reinforcement Learning from AI Feedback (RLAIF), allows for more scalable and transparent alignment. Evaluating the next generation of agentic AI, which can perform multi-step tasks using software tools, requires new benchmarks beyond traditional metrics. Instead of just measuring text quality, evaluations now focus on task completion success, decision-making autonomy, and the correct use of tools. This shift creates a need for high-quality data that reflects real-world instructions and the corresponding successful (and unsuccessful) workflows. The move toward authentic data highlights the shortcomings of purely synthetic datasets. While synthetic data is scalable and excellent for privacy, it often lacks the unpredictable variations and contextual nuances of real-world information. The most effective AI training pipelines now use a hybrid approach: synthetic data provides volume, while human-validated, real-world data offers the grounding and complexity needed to handle edge cases. For AI infrastructure startups, this intensified focus on data quality is reshaping go-to-market strategies. The emphasis is shifting from simply providing tools to enabling a coherent system that improves a buyer's data processing and model performance. In the current fundraising climate, investors are prioritizing AI companies that can demonstrate a clear path to enterprise sales and a defensible, proprietary data or technology advantage, moving beyond reliance on third-party APIs. This evolution in AI training has significant implications for the future of work, with nearly 40% of global jobs exposed to AI-driven change. As AI automates more tasks, demand is growing for new skills in areas like digital health and social media marketing. This creates both a challenge and an opportunity for building a data labeling workforce, as the nature of the required "human-in-the-loop" tasks becomes more specialized and context-rich.

Labs Now Sourcing Real-World Work Artifacts

Get your own daily briefing