AI moderation guardrails circumvented by novel TokenBreak attack

June 13, 2025

(Adobe Stock)

Malicious actors could exploit the novel TokenBreak attack technique to compromise large language models' tokenization strategy and evade implemented safety and content moderation protections, reports The Hacker News.

TokenBreak involved the modification of input words with additional letters to confuse the text classification model, a report from HiddenLayer showed. With the altered text still comprehended in the same way as the original one, threat actors could use the technique to facilitate prompt injection intrusions. "Knowing the family of the underlying protection model and its tokenization strategy is critical for understanding your susceptibility to this attack," said HiddenLayer researchers. Combating such a threat requires the implementation of Unigram tokenizers and bypass trick-using training models. Organizations should also ensure tokenization and model logic alignment, as well as conduct misclassification logging, researchers added. Such findings come after backronyms were reported by Staiker AI Research to be potentially leveraged for AI chatbot jailbreaking.

An In-Depth Guide to AI

Get essential knowledge and practical strategies to use AI to better your security program.

Learn More

SC Staff

stunning futuristic background featuring "agentic ai" on a glowing circuit board. ideal for tech, ai, and innovation projects. high-resolution image perfect for websites, presentations, and more.

AI/ML

Agentjacking attack exploits AI coding tools with fake error reports

SC StaffJune 18, 2026

The Agentjacking attack bypasses the need for stolen credentials or direct network access.

AI/ML

BlackFog launches AI detection for macOS

SC StaffJune 18, 2026

The new ADX Vision for macOS targets the increasing use of unsanctioned generative AI tools within organizations.

AI/ML

AWS launches Continuum security platform with AI

SC StaffJune 18, 2026

AWS Continuum utilizes model-agnostic AI, drawing from various frontier models to analyze structured data like infrastructure and code, as well as unstructured data such as documents and communications.

Get daily email updates

SC Media's daily must-read of the most current and pressing daily news

AI moderation guardrails circumvented by novel TokenBreak attack

An In-Depth Guide to AI

Related

Agentjacking attack exploits AI coding tools with fake error reports

BlackFog launches AI detection for macOS

AWS launches Continuum security platform with AI

Get daily email updates