This cyberattack lets hackers crack AI models just by changing a single character


  • Scientists from HiddenLayer devised a new LLM attack called tokenbreaker
  • By adding or changing a single character they are able to bypass some protection
  • The underlying LLM still understands the intention

Security researchers have found a way to work around the protective mechanisms baked for some large language models (LLM) and cause them to respond to malicious prompts.

Kieran Evans, Kasimir Schulz and Kenneth Yeung of Hiddenlayer published an in-depth report on a new attacking technique, which they called Tokenbreak, which is targeted at the way there are certain LLMS-Tokenize text, especially those using town pairs coded (BPE) or Wordpie-Tokenization Strategies.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top