Anthropic ditches its defining safety promise to pause dangerous AI development because it's basically pointless when everybody else is 'blazing ahead'

Given the way the AI industry is going these days, the following news probably isn't a huge surprise. But it's unnerving all the same. Announced in a new blog post, Anthropic, arguably the sole remaining example among the major AI players that really bigs up its safety responsibilities, has ditched its core commitment to "pause" development of more powerful AI models if suitable safety safeguards aren’t ready.

In previous versions of what Anthropic calls its Responsible Scaling Policy (RSP), the organisation said that if its AI systems approached certain dangerous capability thresholds—particularly around catastrophic misuse—it would halt further scaling or deployment until adequate safety measures were in place.

But that commitment is now gone from Anthropic's newly updated RSP. In Version 3.0 of the RSP, Anthropic has dumped explicit references to “pausing” of development in favour of softer language focused on “responsible development,” “risk management,” and “iterative deployment.”

So, why is this happening? Partly, Anthropic seems to be saying, because it's futile being the only AI outfit explicitly committed to safety. "If one AI developer paused development to implement safety measures while others moved forward training and deploying AI systems without strong mitigations, that could result in a world that is less safe—the developers with the weakest protections would set the pace, and responsible developers would lose their ability to do safety research and advance the public benefit," Anthropic's full policy document says.

"We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments…if competitors are blazing ahead,” Anthropic’s chief science officer Jared Kaplan also told Time magazine.

Inevitably, Anthropic is pitching the changes to its Responsible Scaling Policy as a net positive for safety. Long story short, it says the new policy adds a commitment to produce ongoing, publicly shareable roadmaps and risk reports that are intended to show how Anthropic is thinking about and managing safety issues as models become more capable.

"This third revision amplifies what worked about the previous RSP, commits us to more transparency about our plans and our risk considerations, and separates out our recommendations for the industry at large from what we can achieve as an individual company," the new policy doc says.

Which is nice. But it still seems a long way short of its old commitment to essentially down tools if the bots threatened to get out of control. Oh well.

