Anthropic Launches Initiative to Fund Third-Party AI Model Evaluations for Safety and Capabilities

November 05, 2025 · 3 min read

In a bid to strengthen AI safety and accountability, Anthropic has unveiled a new funding initiative aimed at developing third-party evaluations for AI models. The program seeks to address the growing demand for high-quality assessments that can accurately measure advanced capabilities and potential risks in large language models and other AI systems. With the current evaluation ecosystem described as limited and struggling to keep pace with rapid AI advancements, this move is positioned to benefit the broader AI community by providing standardized tools for responsible development.

Anthropic, known for its AI models like Claude, is focusing on evaluations that align with its Responsible Scaling Policy, particularly those tied to AI Safety Levels (ASL-3 and ASL-4). These levels define safety requirements for models with heightened capabilities, and robust evaluations are crucial for ensuring compliance and mitigating hazards. The initiative prioritizes metrics that go beyond memorization, emphasizing generalization and expert-level performance in domains such as cybersecurity and scientific reasoning.

The funding will support three key areas: ASL assessments, advanced capability and safety evaluations, and tools to streamline evaluation infrastructure. Anthropic highlights characteristics of effective evaluations, including difficulty, data novelty, scalability, and domain expertise. For instance, evaluations should avoid training data contamination and incorporate diverse formats like task-based tests or human trials to capture real-world applicability.

According to the company, developing reliable evaluations is challenging, with common pitfalls including insufficient difficulty and poor documentation. Anthropic recommends iterative development, starting with small-scale tests and scaling up, while ensuring evaluations are reproducible and well-documented using standards like Inspect or METR. This approach aims to create metrics that, if a model scores highly, would genuinely concern safety experts about potential incidents.

Proposals can be submitted through an application form, with Anthropic offering tailored funding and direct collaboration with its Frontier Red Team and other experts. The initiative is part of a broader effort to establish comprehensive evaluation as an industry norm, fostering transparency and trust in AI deployments. By involving third-party organizations, Anthropic hopes to catalyze progress toward a future where AI assessments are routine and rigorous.

The announcement comes amid increasing scrutiny of AI safety, with companies and regulators seeking reliable methods to evaluate model behavior. Anthropic's initiative could set a precedent for how the industry approaches risk assessment, potentially influencing global standards. For developers and researchers, this funding opportunity represents a chance to contribute to critical safety tools that could shape the trajectory of AI development.