Anthropic Asks Governments to Block High-Risk AI Models
AI

Anthropic Asks Governments to Block High-Risk AI Models

June 10, 20263 min read
TL;DR

Anthropic's new policy frameworks would grant regulators power to halt catastrophic AI deployments and fine companies based on global revenue.

Anthropic wants regulators to have the power to stop its own products from reaching the market. The company published two policy frameworks this week calling on governments to gain legal authority to block AI deployments that pose catastrophic risks, backed by civil penalties tied to global annual revenue.

Scope is deliberately narrow. According to The Next Web, the safety framework applies only to models trained using more than 10²⁵ floating-point operations and built by companies earning more than $500 million in AI revenue or spending more than $1 billion on AI research and development. Only a handful of companies cross both bars: Anthropic, OpenAI, Google DeepMind, xAI, and potentially Meta.

Four categories of risk are severe enough to warrant blocking a deployment: assistance with biological weapons development, discovery of large-scale cyber vulnerabilities, loss of control over autonomous systems, and models capable of automating their own research. Each addresses a failure mode that could, in Anthropic's framing, cause irreversible harm at civilizational scale.

Frontier developers would be required to test models against those risk categories before deployment and publish summaries of their evaluations. Civil penalties would escalate with repeated violations, a structure designed to hurt even well-capitalized companies that treat a one-time fine as a cost of doing business. The Next Web notes the safety document is the more aggressive of the two frameworks Anthropic released.

The evidence Anthropic cites for the policy is its own product. Claude Mythos Preview, as The Next Web details, discovered thousands of high-severity security vulnerabilities across every major operating system and browser during testing. The company uses that finding as proof that catastrophic risk from advanced artificial intelligence is not theoretical.

Mythos Preview has since been upgraded. According to Infosecurity Magazine, Anthropic this week released Claude Mythos 5, which the company claims carries "the strongest cybersecurity capabilities of any model in the world," initially deployed through Project Glasswing, a program run in collaboration with the US government. Releasing a model with unprecedented offensive security capabilities while simultaneously proposing government authority to block such models is a coherent argument. It is also a convenient one.

Alongside Mythos 5, Anthropic launched Fable 5, built on the same underlying model with additional guardrails in sensitive domains like cybersecurity. Infosecurity Magazine reports both are priced at $10 per million input tokens and $50 per million output, less than half the cost of Mythos Preview. Anthropic plans wider availability beyond the current government program.

A second document, the Economic Policy Framework, addresses what happens to workers as artificial intelligence displaces labor across industries. It calls for capital redistribution mechanisms and a strengthened social safety net. Enforcement mechanisms here are far softer than on the safety side, reflecting the difficulty of designing punitive regulation for economic outcomes that unfold over years rather than deployment events.

Regulation and competition

Years of voluntary industry commitments on artificial intelligence safety have produced little binding obligation. What distinguishes Anthropic's proposal is the demand for legal authority backed by financial penalties, and the willingness to name the five or so companies that would actually be regulated. Compliance at this scale is expensive, and that cost functions as a competitive barrier against any challenger operating below the revenue threshold.

Precedent exists. The EU AI Act established tiered oversight based on model capability and risk category, though its enforcement mechanisms remain largely untested. In the United States, no agency currently holds statutory authority to block a model before deployment. Enacting what Anthropic proposes would require Congress to grant that power explicitly, or an existing regulator to assert jurisdiction in territory no legislature has yet clearly mapped.

Frontier labs are not waiting for that clarity. Mythos 5 and Fable 5 are already shipping. If governments treat Anthropic's frameworks as a starting point rather than a wishlist, the first conflict will be jurisdictional: which agency, in which country, decides when an artificial intelligence model has crossed the line.

FAQ

Q: Which AI companies would Anthropic's proposed safety framework regulate?
A: The framework targets companies earning more than $500 million in AI revenue or spending more than $1 billion on AI R&D, developing models trained on more than 10²⁵ FLOPs. That currently means Anthropic, OpenAI, Google DeepMind, xAI, and potentially Meta.

Q: What are the four catastrophic risk categories in Anthropic's safety framework?
A: The framework covers assistance with biological weapons, discovery of large-scale cyber vulnerabilities, loss of control over AI systems, and models that can automate their own research and development.

Q: What is Claude Mythos 5 and how does it relate to the policy proposal?
A: Mythos 5 is Anthropic's upgraded cybersecurity-focused model, currently deployed with the US government through Project Glasswing. Anthropic cites its predecessor's discovery of thousands of OS and browser vulnerabilities as evidence that the risks it wants regulated are already real.

Q: What is Claude Fable 5?
A: Fable 5 runs on the same underlying system as Mythos 5 but with additional guardrails in high-risk domains. It is priced at $10 per million input tokens and $50 per million output, less than half the cost of Mythos Preview.