TokenBreak involved the modification of input words with additional letters to confuse the text classification model, a report from HiddenLayer showed. With the altered text still comprehended in the same way as the original one, threat actors could use the technique to facilitate prompt injection intrusions. "Knowing the family of the underlying protection model and its tokenization strategy is critical for understanding your susceptibility to this attack," said HiddenLayer researchers. Combating such a threat requires the implementation of Unigram tokenizers and bypass trick-using training models. Organizations should also ensure tokenization and model logic alignment, as well as conduct misclassification logging, researchers added. Such findings come after backronyms were reported by Staiker AI Research to be potentially leveraged for AI chatbot jailbreaking.
AI/ML, Threat Intelligence
AI moderation guardrails circumvented by novel TokenBreak attack

(Adobe Stock)
Malicious actors could exploit the novel TokenBreak attack technique to compromise large language models' tokenization strategy and evade implemented safety and content moderation protections, reports The Hacker News.
TokenBreak involved the modification of input words with additional letters to confuse the text classification model, a report from HiddenLayer showed. With the altered text still comprehended in the same way as the original one, threat actors could use the technique to facilitate prompt injection intrusions. "Knowing the family of the underlying protection model and its tokenization strategy is critical for understanding your susceptibility to this attack," said HiddenLayer researchers. Combating such a threat requires the implementation of Unigram tokenizers and bypass trick-using training models. Organizations should also ensure tokenization and model logic alignment, as well as conduct misclassification logging, researchers added. Such findings come after backronyms were reported by Staiker AI Research to be potentially leveraged for AI chatbot jailbreaking.
TokenBreak involved the modification of input words with additional letters to confuse the text classification model, a report from HiddenLayer showed. With the altered text still comprehended in the same way as the original one, threat actors could use the technique to facilitate prompt injection intrusions. "Knowing the family of the underlying protection model and its tokenization strategy is critical for understanding your susceptibility to this attack," said HiddenLayer researchers. Combating such a threat requires the implementation of Unigram tokenizers and bypass trick-using training models. Organizations should also ensure tokenization and model logic alignment, as well as conduct misclassification logging, researchers added. Such findings come after backronyms were reported by Staiker AI Research to be potentially leveraged for AI chatbot jailbreaking.
An In-Depth Guide to AI
Get essential knowledge and practical strategies to use AI to better your security program.
Get daily email updates
SC Media's daily must-read of the most current and pressing daily news
You can skip this ad in 5 seconds