Anthropic Raises Claude Opus 4 to Its Highest Safety Level

April 20, 20261 min read

TL;DR

New controls block misuse in chemical, biological, and nuclear weapons research as AI capabilities outpace risk assessments.

Anthropic has implemented AI Safety Level 3 (ASL-3) protections for its latest model, Claude Opus 4, marking a significant step in AI security. This move, part of the company's Responsible Scaling Policy, introduces stricter measures to guard against sophisticated threats, even as the model's risk level remains under evaluation.

The ASL-3 standards include enhanced deployment controls to limit Claude's potential misuse in chemical, biological, radiological, and nuclear (CBRN) weapons workflows. These measures are narrowly targeted, designed to block extended, end-to-end assistance without broadly refusing user queries, reflecting a cautious approach to emerging AI dangers.

Security upgrades focus on protecting model weights—the core parameters that define an AI's capabilities—from theft by non-state actors. Over 100 controls are in place, such as two-party authorization and binary allowlisting, drawing from established cybersecurity practices to harden defenses against unauthorized access.

A key innovation is the use of egress bandwidth controls, which restrict data outflow from secure environments. By capping network traffic rates, Anthropic aims to make exfiltrating large model weights nearly impossible before detection, turning the models' size into a security advantage.

The decision to activate ASL-3 proactively stems from uncertainty around Claude Opus 4's CBRN capabilities. Unlike previous models, this one showed enough improvement in evaluations to warrant precautionary measures, though it hasn't definitively crossed the threshold requiring such protections.

Anthropic acknowledges that these safeguards are a work in progress, with ongoing refinements needed to balance effectiveness and user impact. The company plans to iterate based on real-world use, collaborating with industry and government to address evolving threats.

This approach highlights broader challenges in AI safety, where rapid capability gains outpace risk assessments. As models grow more powerful, preemptive security measures become essential to mitigate catastrophic misuse without stifling innovation.

Industry observers note that Anthropic's transparency could set a precedent for other AI developers. By sharing detailed reports, the firm aims to foster collective improvements in safety standards, preparing for even more advanced AI systems on the horizon.