Anthropic ditches its signature security promise and rewrites AI protection

Anthropic has removed its promise not to train or release AI models without guaranteed safety constraints in advance
The company will now rely on transparency reports and safety roadmaps instead of strict assumptions
Critics argue the shift shows the limits of voluntary AI security commitments without binding regulation

Anthropic has formally abandoned the key promise not to train or release cross-border AI systems unless it can guarantee adequate security in advance. The company behind Claude confirmed the decision in an interview with Timemarking the end of a policy that had once set it apart among AI developers. The newly revised responsible scaling policy focuses more on ensuring the company remains competitive as the AI market heats up.

For years, Anthropic framed this promise as proof that it would withstand the commercial pressures that pushed competitors to ship ever more powerful systems. The policy effectively prevented it from moving beyond certain levels unless predefined safeguards were already in place. Now Anthropic uses a more flexible framework rather than categorical breaks.

The company insists the change is pragmatic rather than ideological. Executives argue that unilateral restraint no longer makes sense in a market defined by rapid iteration and geopolitical urgency. But the shift feels like a turning point in how the AI industry thinks about self-regulation.

Under the new responsible scaling policy, Anthropic commits to publishing detailed “Border Security Roadmaps” outlining its planned security milestones, along with regular “Risk Reports” assessing model capabilities and potential threats. The company also says it will match or exceed competitors’ security efforts and delay development if it both believes it is leading the field and identifies significant catastrophic risk. What it will no longer do is promise to stop training until all remedies are guaranteed in advance.

Everyday users may not notice any changes when interacting with Claude or other AI tools. Yet the guardrails that govern how these systems are trained affect everything from accuracy to fraudulent abuse. When the company, once defined by its strict assumptions, decides that those terms are no longer useful, it signals a broader recalibration within the industry.

Claude control

When Anthropic introduced its original policy in 2023, some executives hoped it might inspire rivals or even inform possible regulation. The regulatory momentum never materialized. Federal AI legislation remains stalled, and the broader political climate has tilted away from developing any framework. Businesses are left to choose between voluntary restraint and competitive survival.

Anthropic is growing fast, with both revenue and its portfolio outpacing rivals like OpenAI and Google, even poking fun at ChatGPT and getting ads in a Super Bowl commercial. But the company clearly saw the security redline as an obstacle to this growth.

Anthropic maintains that its revised framework retains meaningful safeguards. The new roadmaps must create internal pressure to prioritize mitigation research. The upcoming risk reports aim to provide a clearer public account of how model capabilities can lead to misuse.

“The new policy still includes some safeguards, but the key promise that Anthropic would not release models unless it could guarantee adequate security mitigations in advance is gone,” said Nik Kairinos, CEO and co-founder of RAIDS AI, an organization focused on independent monitoring and risk detection in AI. “This is precisely why continuous, independent monitoring of AI systems matters. Voluntary commitments can be rewritten. Regulation, backed by real-time monitoring, cannot.”

Kairinos also noted the irony of Anthropic’s $20 million a few weeks ago to Public First Action, a group that supports congressional candidates who promise to push for AI safety regulation. This contribution, he suggested, emphasizes the complexity of the present moment. Companies can advocate for stronger regulation while recalibrating their own internal constraints.

(Image credit: Getty Images/Smith Collection/Gado)

The broader question facing the industry is whether voluntary norms can meaningfully shape the trajectory of transformative technologies. Anthropic once tried to anchor itself as a model of restraint. Its revised policy requires it to compensate for the competition. This does not mean that security has been abandoned, but it does mean that the order of operations has changed.

The average person may not read responsible scaling policies or risk reports, but they live with the downstream effects of these decisions. Anthropic argues that meaningful security research requires staying at the frontier and not stepping back from it. Whether that philosophy turns out to be reassuring or unsettling depends largely on one’s view of how fast AI needs to move and how much risk society is willing to tolerate in exchange for progress.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply