Anthropic Seeks Third-Party Model Evaluations
Anthropic is actively soliciting proposals from third parties to conduct model evaluations, indicating a trend toward external validation and transparency. This follows internal research showing that instructions alone do not prevent unsafe behavior, and structural interventions like sandboxing and adversarial testing are more effective.
- Anthropic's call for proposals specifically targets three priority areas for new evaluations: AI Safety Level (ASL) assessments related to cybersecurity and biological risks, advanced capability and safety metrics, and the development of infrastructure and tools to make evaluation easier for subject-matter experts. - This initiative follows the first-ever joint evaluation of an Anthropic model, Claude 3.5 Sonnet, by the U.S. and U.K. AI Safety Institutes, which tested for biological, cyber, and software risks prior to deployment. - The push for external evaluations aligns with Anthropic's "Constitutional AI" approach, a method that trains models to critique and revise their own outputs based on a predefined set of ethical principles, reducing the reliance on human-labeled feedback for safety. - Reinforcement Learning from Human Feedback (RLHF) is a core process where human labelers rank model outputs, creating a "reward model" that guides the AI's policy; this is the data-intensive workflow that a data labeling business would directly service. - Evaluating emerging agentic AI systems requires specialized benchmarks like AgentBench, WebArena, and GAIA, which test multi-step reasoning, decision-making, and tool use, creating new, complex data labeling needs beyond simple text classification. - While synthetic data can be generated much faster and cheaper than human labeling, it often lacks the nuance and real-world messiness required to train robust models, creating a strategic opening for high-quality human data providers. - The fundraising climate for AI infrastructure is strong, with recent multi-billion dollar commitments and a trend showing AI startups raising a third of all venture capital and commanding significantly higher seed valuations than non-AI companies. - The data labeling workforce is shifting from a low-cost gig economy model, which was prevalent for labeling simple computer vision tasks, to a demand for highly specialized domain experts like lawyers and doctors to provide context-rich annotations for frontier models.