When AI goes off-script: Understanding the rise of prompt injection attacks

Picture this: a job applicant submits a resume polished by AI. Hidden inside the file is an invisible instruction. When the hiring system’s AI scans it, the model confidently reports that the applicant is an ideal candidate, even if the resume says otherwise. No hacking. No malware. Just carefully crafted language designed to take advantage of how the AI interprets prompts.

This is prompt injection. And it’s quickly becoming one of the most important cybersecurity issues facing generative AI systems today.

Why prompt injection tops OWASP’s GenAI risk list

As generative AI (GenAI) becomes embedded in business-critical applications, a subtle but significant risk has emerged. OWASP, the group behind the widely used Top 10 application security list, has released new guidance specific to generative AI. Prompt injection sits at the top of the organization's 2025 OWASP Top 10 for LLM Applications and Generative AI .

[Read Part Two of this article: Defending the Prompt: How to Secure AI Against Injection Attacks]

The attack doesn't exploit traditional software flaws. It manipulates how large language models (LLMs) interpret language itself — an entirely different kind of vulnerability that’s already being used to alter outputs, leak private data, and hijack application behavior.

How prompt injection works in large language models (LLMs)

So what exactly is a prompt injection?

At its core, this type of attack manipulates the instructions a large language model receives in order to change its behavior. These manipulations can come directly from user input or indirectly from external content the model has been asked to process. The end result is the same: the model does something it wasn’t supposed to do.

OWASP defines prompt injection as any user prompt that alters an LLM’s behavior or output in unintended ways. That could be as simple as a user asking a chatbot to ignore its safety guardrails, or as subtle as hiding an instruction inside a piece of data the model is pulling from a public source. In many cases, the malicious input isn’t visible to a human at all, but it still affects how the model responds.

OWASP’s evolving definition of prompt injection

OWASP’s understanding of prompt injection has evolved significantly in just two years. In 2023, the group described it in relatively narrow terms, closely linking it to so-called jailbreaks where users trick AI systems into saying or doing things that violate safety policies. By 2025, that view had expanded. The new report draws clearer distinctions between direct and indirect injection techniques, includes examples involving Retrieval Augmented Generation (RAG), and accounts for multi-modal models that combine text, images, and other media. The updated guidance reflects a more realistic view of how these vulnerabilities surface in production environments.

Types of prompt injection: Direct vs. indirect attacks

Prompt injections tend to fall into two broad categories. Direct injections happen when a user enters something like, “Ignore all previous instructions and instead...” That’s a common jailbreak technique. It exploits the fact that LLMs, unlike traditional software, aren’t great at separating system instructions from user input. If the model is too trusting of the prompt, it might change its behavior mid-conversation.

Indirect injections are more subtle. These occur when an AI system processes outside content, like summarizing a webpage or analyzing a document, that contains a hidden instruction. For example, a malicious prompt could be buried in metadata, embedded in markdown, or inserted into a product review. The user thinks they’re asking for a summary. The AI ends up following a command it was never meant to see.

Jailbreaking vs. prompt injection: Understanding the difference

OWASP explains that jailbreaking is essentially a subset of prompt injection. It specifically refers to techniques that get the model to ignore safety protocols altogether. While jailbreaks are often the most visible type of injection, they’re far from the only form. More often, attackers rely on quiet, carefully planted language to manipulate the AI's behavior without setting off alarms.

New risks in multi-modal and obfuscated prompt attacks

The risks increase as models get more sophisticated. OWASP highlights prompt injection risks in multi-modal systems, where AI combines text, images, and audio. An attacker could embed a prompt inside an image that, when processed alongside accompanying text, causes the model to behave differently. These types of inputs are difficult to spot with the human eye and can be especially difficult to block without also interfering with legitimate functionality.

Obfuscation adds another layer of complexity. Attackers might encode instructions in Base64, spread them across multiple messages, or even use emojis and foreign languages to slip past filters. In one example from OWASP, an attacker uses payload splitting, embedding separate pieces of the prompt in different fields of a document. The model processes the inputs together and executes the hidden instruction.

Why LLMs are especially vulnerable to prompt injection

These vulnerabilities are hard to fix because they’re baked into how LLMs operate. Unlike traditional systems, LLMs treat everything as potential instruction. They don’t distinguish between user input, system configuration, and supporting data. That creates a gray area where malicious commands can slip through, often without detection.

It’s not just about generating quirky or embarrassing responses. Many LLMs are integrated with broader systems. They can send emails, approve transactions, access APIs, or trigger internal workflows. If a prompt injection successfully alters the model’s behavior, it may have real-world consequences — sometimes before anyone realizes what went wrong.

What prompt injection attacks can do in the real world

The stakes vary depending on how the model is used. In customer support, an attacker might extract private information. In HR, it could mean a resume is given an unjustified recommendation. In a RAG-based enterprise search tool, an attacker could poison the source data to influence how results are framed. In all cases, prompt injection leverages the AI’s trust in its input to quietly reshape its behavior.

Who should be concerned about prompt injection threats?

This is not just a developer concern. OWASP urges a range of stakeholders to understand prompt injection risks, including CISOs, application security leads, product managers, and policymakers. Security teams should treat LLMs like semi-autonomous users. These are users that can act on behalf of people but are vulnerable to manipulation. Business leaders must understand where these risks intersect with critical systems.

If your organization uses, builds, or depends on GenAI tools, even via third-party APIs, prompt injection is a threat you need to understand and address.

About this series: OWASP GenAI security top 10 explained

This article is part of a 10-part SC Media series exploring the OWASP Top 10 for LLM Applications 2025. Future stories will cover related vulnerabilities such as System Prompt Leakage, Excessive Agency, and Vector and Embedding Weaknesses. The series is part of an editorial collaboration between SC Media and the OWASP Generative AI Security Project, aimed at helping developers, engineers, and security professionals better understand and defend against the unique threats facing GenAI applications.

Coming next: How to defend against prompt injection attacks

In Part Two, we’ll walk through practical mitigation strategies, architectural changes, and design patterns that can reduce the risk. From input filtering to human-in-the-loop oversight, there’s no silver bullet — but there are effective defenses.

Up next, we’ll look at how to defend against prompt injection attacks, including architectural design tips, input handling best practices, and OWASP’s top recommendations for reducing exposure.