Leaky chatbots: Understanding sensitive information disclosure in AI

You prompt an AI chatbot to assist with a contract draft and get back something strange: a snippet of someone else’s contract. A developer tests a public large language model (LLM) and finds private API keys. A Samsung employees use ChatGPT to debug code and accidentally leak sensitive semiconductor source files.

None of these scenarios involved malware, phishing, or firewall breaches. But, all led to serious data exposure.

This is the reality of Sensitive Information Disclosure, one of the fastest-emerging security threats in the age of generative AI. It’s also the second-ranked risk on the OWASP Top 10 for LLM Applications 2025. And according to red teams, bug bounty researchers, and national security experts, it’s already happening on a dangerous scale.

The hidden cost of convenience

LLMs work by learning from data and responding in human-like ways. But that strength becomes a liability when models are trained on unfiltered logs, emails, Slack threads, or public code repositories. Even if the content is scrubbed, models can retain structure, phrasing, or statistical cues that allow them to reproduce what should have remained confidential.

OWASP identifies three core categories of sensitive data at risk:

PII Leakage (e.g., names, emails, phone numbers)
Proprietary Algorithm or Source Code Exposure
Confidential Business Data Disclosure (contracts, financials, strategy documents)

A stark example: Truffle Security’s February audit report of open-source LLM training datasets (including Common Crawl) uncovered over 12,000 live API keys, secrets, and credentials — some tied to AWS, GitHub, Stripe, and Twilio. Many of those keys were still active.

Meanwhile, Cobalt’s 2024 pentesting report found that GenAI vulnerabilities are less likely to be fixed than traditional bugs: just 21% of AI-specific issues were remediated, compared to 76% for APIs. The gap isn’t just technical, rather cultural, according to Cobalt.

“Business velocity is outpacing security readiness,” said Gunter Ollmann, CTO of Cobalt in a SC Media article. “Organizations are deploying LLM-based apps quickly, but without the secure-by-design controls we’ve come to expect elsewhere.”

Model inversion: When AI remembers too much

Even if a model isn’t connected to internal data sources, it can still leak sensitive info baked into its training data. That’s the basis of model inversion attacks, where researchers or adversaries repeatedly query a model to reconstruct its underlying training inputs.

A now-infamous case is the “Proof Pudding” attack (CVE-2019-20634). Researchers extracted specific emails used in training to bypass filters and access protected systems — effectively weaponizing the model’s memory.

The concern is serious enough that the NSA issued guidance in 2024 warning federal agencies and contractors to assume that public models may leak information if trained on shared data.

OpenAI’s “GPTs” and leaking instructions

Even custom-built GPTs created using consumer tools have been caught leaking sensitive information. In one analysis, researchers found that some GPTs would expose their own uploaded documents, instructions, and system prompts when probed correctly. These models weren’t acting maliciously. They were simply too helpful.

For businesses the risks are magnified, as Neal Ziring, the technical director of the National Security Agency’s (NSA’s) Cybersecurity Directorate, explained in a fireside chat with Billington Cybersecurity in 2022.

“If you’re a government agency, you’ve put a lot of effort into training your model, perhaps you used highly sensitive data to train it,” Ziring said. “[A]n attacker might attempt to query your model in a mathematically guided fashion in order to extract facts about the model, its behavior or the data that was used to train it. If the data used to train it was highly sensitive, proprietary, nonpublic, you don’t want that to happen.”

The point? Generative AI is code that follows strict instructions and doesn’t know what not to share. Hackers know this and use black box attack technique to reverse-engineer AI/ML models to pry the sensitive data used to train them, Ziring said.

Users are the weakest link and the easiest entry point

Some LLM providers, use their customer’s input (prompts and conversations) to improve model performance and train future versions. This means that information you enter could be used to generate answers for others.

For that reason, not all data disclosures are the model’s fault. Employees and customers often paste sensitive content into prompt windows without realizing that their input may be stored, logged, reused by the provider, or extracted by other users.

In the 2022 case of Samsung, employees unintentionally uploaded sensitive semiconductor plant source code while troubleshooting a bug, also when optimizing a test sequence for chip yields and using an LLM to transcribe a confidential company meeting.

What comes next

Sensitive data disclosures don’t always show up in SIEM alerts or threat intel feeds. But they can be just as damaging as a breach. OWASP is urging AI builders and security teams to design with disclosure in mind.

In Part Two, SC Media explores the concrete defenses: from input filtering to federated learning — that organizations can adopt now to reduce the risk.

This article is part of SC Media’s 10-part editorial series on the OWASP Top 10 for LLM Applications 2025. Produced in collaboration with the OWASP Generative AI Security Project, the series explores how to build safer, more responsible GenAI applications.

Next: Can LLMs be taught to keep secrets? A closer look at OWASP’s playbook for prevention.