Anthropic has a new security system it says can stop almost all AI -Jailbreaks


  • Anthropic reveals new proof-of-concept-security measures tested on Claude 3.5 Sonnet
  • “Constitutional Classifiers” is an attempt to teach LLMS -value systems
  • Tests resulted in more than one 80% reduction in successful jailbreaks

In an attempt to tackle the abuse of natural language asked for AI tools, Openai Rival Anthropic has revealed a new concept it calls “constitutional classifiers”; A means of introducing a set of human -like values ​​(literally a constitution) in a large language model.

Anthropics protective research team revealed the new security measure designed to limit jailbreaks (or obtain output that goes outside of an LLM’s established protective measures) by Claude 3.5 Sonnet, its latest and largest large language model, in a new academic paper.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top