AI doesn’t need to be hacked to leak confidential content. It just needs to be asked the right way.

In the era of GenAI, sensitive information doesn’t just live in files or servers. It resides in vectors, embeddings and training tokens. Once an LLM has ingested something, there’s a real chance it might say it back.

Because large language models (LLM) ingest vast amounts of data via training and through user generated chats, policing the sensitive information disclosed to users becomes uniquely challenging. Quite simply, there's no guaranteed way to make LLMs keep secrets.

Sanitize early and often

The first and most critical defense, OWASP stresses, is preventing sensitive information from reaching the model in the first place. That begins with careful data hygiene. Training data, fine-tuning sets, and even user inputs during inference should be sanitized through redaction, masking or tokenization. This means actively scanning for personally identifiable information (PII), API keys, proprietary code, or business-critical terms, all before any of it touches the model.

LLM are a semi-trusted user

Security researchers SC Media spoke with recommend a bevy of third-party cybersecurity tools for scanning and flagging secrets across source repositories and documentations. In many recent LLM breaches , organizations unknowingly trained on raw logs, exposed credentials, or support transcripts, giving their models far more information than intended.

OWASP encourages organizations to treat LLMs as entities with semi-trusted access, similar to limiting network access to a junior employee. That means applying the principle of least privilege to every LLM deployment. If an AI assistant doesn’t need access to billing, HR records, or internal contracts to complete a task, it shouldn’t have it at the system level, not just in prompt design.

When the stakes are higher, so should the defenses

Developers are advised to place strong access boundaries around runtime data and to vet both inbound and outbound queries through intermediate layers. That includes auditing logs for unusual prompt patterns or unusually detailed responses, which may indicate that the model is surfacing unintended content.

In sectors like healthcare, finance, or government, where compliance rules are strict and breaches come with legal consequences, OWASP recommends deploying more advanced privacy-enhancing techniques.

Federated learning is one such method. It allows model training across decentralized systems — like individual devices or servers — without ever pooling sensitive data into a centralized location. By distributing the learning process, federated learning minimizes exposure while preserving performance.

Guard the system prompt like it’s an API key

Another option is differential privacy, which intentionally adds statistical “noise” to outputs or model weights, making it nearly impossible to reverse-engineer a single user’s data. Though these methods may reduce model accuracy at the margins, they deliver strong gains in confidentiality — particularly when dealing with high-risk workloads.

One of the most overlooked attack surfaces in LLM deployments is the system prompt, the hidden instruction set that governs model behavior. If this prompt is poorly secured or inadvertently exposed, attackers may be able to reverse-engineer roles, permissions, or embedded data schemas.

Educate the humans, not just the machines

OWASP advises hardening system prompts in the same way you would secure environment variables or root keys. This means removing access to developer modes, stripping verbose error messages, and configuring the application so that users cannot override the model’s foundational behavior. The more hidden and controlled the system layer, the harder it becomes for attackers to manipulate it or extract secrets.

While technical defenses are critical, OWASP also puts heavy emphasis on educating end users — because a large share of GenAI data leaks originate from well-meaning people pasting sensitive material into prompt boxes without fully understanding the risks.

Organizations should invest in clear training, instructing users not to include PII, financials, or proprietary content in their interactions with LLMs unless explicitly approved. At the same time, vendors and platform providers must be transparent about what data is stored, retained, or used for future training and must offer users the ability to opt out.

Even hardened systems leak: But that’s not an excuse

In practice, companies that do identify LLM-related bugs fix them faster than traditional vulnerabilities. According to Cobalt , the average time to remediate a GenAI vulnerability is just under 30 days. The challenge, however, is that most of these issues are never caught in the first place. That means prevention through both design and awareness is still the best defense.

Security researchers at Adversa AI and Northwestern University have shown that even well-trained, well-protected LLMs can be manipulated into revealing sensitive data under the right conditions. No single tool or framework will eliminate the risk entirely.

That’s why OWASP isn’t calling for zero-leakage guarantees. Instead, it promotes resilience: building GenAI systems that are more transparent, more auditable, and more realistic about their own limits. A secure LLM isn’t one that never makes mistakes. It’s one that’s built not to amplify them.