Anthropic Activates AI Safety Level 3 Protections for Claude Opus 4 to Combat CBRN Weapon Risks
November 05, 2025 · 3 min read
Anthropic has implemented AI Safety Level 3 (ASL-3) protections for its newly launched Claude Opus 4 model, marking a significant step in the company's Responsible Scaling Policy (RSP). The ASL-3 standards include rigorous security controls to deter model weight theft and deployment safeguards that restrict the AI's ability to assist with chemical, biological, radiological, and nuclear (CBRN) weapon development. This proactive move comes as AI capabilities advance, though Anthropic has not yet confirmed if Claude Opus 4 definitively crosses the threshold requiring ASL-3, emphasizing a cautious approach to frontier AI risks.
The ASL-3 Security Standard introduces over 100 controls, such as two-party authorization and egress bandwidth limits, designed to thwart sophisticated non-state actors. These measures leverage the large size of model weights to detect and block exfiltration attempts, building on best practices from security-conscious industries. By restricting data flow from secure environments, Anthropic aims to make weight theft nearly impossible, even in compromised systems.
Deployment measures under ASL-3 focus narrowly on CBRN-related queries, causing Claude to refuse assistance in end-to-end workflows that could aid weapon acquisition. This targeted approach minimizes false positives but acknowledges the evolving threat of jailbreaks. Anthropic's strategy includes making systems harder to exploit, detecting breaches, and iterating defenses based on real-world use.
The decision to activate ASL-3 preemptively stems from uncertainties in evaluating AI risks, as capabilities like improved CBRN knowledge in models such as Claude Sonnet 3.7 hinted at potential thresholds. Anthropic's RSP allows for higher safety standards when risks cannot be ruled out, ensuring model releases proceed without delays while protections are refined. This iterative process will continue with ongoing assessments of Claude Opus 4's capabilities.
In practice, the new safeguards have led to occasional false positives, affecting legitimate queries, but Anthropic has established vetting systems for exemptions in dual-use scientific applications. The company plans to expand scope to other CBRN threats beyond biological weapons, which currently dominate risk assessments. Detailed reports on these measures are shared industry-wide to foster collaboration.
Anthropic's approach underscores a broader industry challenge: balancing innovation with safety as AI models grow more powerful. By implementing ASL-3, the company sets a precedent for proactive risk management, urging peers to adopt similar frameworks. This move highlights the critical need for robust security in an era where AI misuse could have catastrophic consequences.
Looking ahead, Anthropic will refine its protections based on operational experience, engaging with government, civil society, and users to enhance effectiveness. The company's transparency in publishing its methods aims to accelerate collective preparedness for future AI advancements, ensuring that safety evolves alongside capability.