Anthropic Seeks Third-Party Model Evaluations

Anthropic is actively soliciting proposals from third parties to conduct model evaluations, indicating a trend toward external validation and transparency. This follows internal research showing that instructions alone do not prevent unsafe behavior, and structural interventions like sandboxing and adversarial testing are more effective.

- Anthropic's call for proposals specifically targets three priority areas for new evaluations: AI Safety Level (ASL) assessments related to cybersecurity and biological risks, advanced capability and safety metrics, and the development of infrastructure and tools to make evaluation easier for subject-matter experts. - This initiative follows the first-ever joint evaluation of an Anthropic model, Claude 3.5 Sonnet, by the U.S. and U.K. AI Safety Institutes, which tested for biological, cyber, and software risks prior to deployment. - The push for external evaluations aligns with Anthropic's "Constitutional AI" approach, a method that trains models to critique and revise their own outputs based on a predefined set of ethical principles, reducing the reliance on human-labeled feedback for safety. - Reinforcement Learning from Human Feedback (RLHF) is a core process where human labelers rank model outputs, creating a "reward model" that guides the AI's policy; this is the data-intensive workflow that a data labeling business would directly service. - Evaluating emerging agentic AI systems requires specialized benchmarks like AgentBench, WebArena, and GAIA, which test multi-step reasoning, decision-making, and tool use, creating new, complex data labeling needs beyond simple text classification. - While synthetic data can be generated much faster and cheaper than human labeling, it often lacks the nuance and real-world messiness required to train robust models, creating a strategic opening for high-quality human data providers. - The fundraising climate for AI infrastructure is strong, with recent multi-billion dollar commitments and a trend showing AI startups raising a third of all venture capital and commanding significantly higher seed valuations than non-AI companies. - The data labeling workforce is shifting from a low-cost gig economy model, which was prevalent for labeling simple computer vision tasks, to a demand for highly specialized domain experts like lawyers and doctors to provide context-rich annotations for frontier models.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.