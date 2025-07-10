You prompt an AI chatbot to assist with a contract draft and get back something strange: a snippet of someone else’s contract. A developer tests a public large language model (LLM) and finds private API keys. A Samsung employees use ChatGPT to debug code and accidentally leak sensitive semiconductor source files.

None of these scenarios involved malware, phishing, or firewall breaches. But, all led to serious data exposure.

The hidden cost of convenience

This is the reality of Sensitive Information Disclosure, one of the fastest-emerging security threats in the age of generative AI. It’s also the second-ranked risk on the OWASP Top 10 for LLM Applications 2025 . And according to red teams, bug bounty researchers, and national security experts, it’s already happening on a dangerous scale.

LLMs work by learning from data and responding in human-like ways. But that strength becomes a liability when models are trained on unfiltered logs, emails, Slack threads, or public code repositories. Even if the content is scrubbed, models can retain structure, phrasing, or statistical cues that allow them to reproduce what should have remained confidential.

PII Leakage (e.g., names, emails, phone numbers) Proprietary Algorithm or Source Code Exposure Confidential Business Data Disclosure (contracts, financials, strategy documents)

OWASP identifies three core categories of sensitive data at risk:

A stark example: Truffle Security’s February audit report of open-source LLM training datasets (including Common Crawl) uncovered over 12,000 live API keys, secrets, and credentials — some tied to AWS, GitHub, Stripe, and Twilio. Many of those keys were still active.

Meanwhile, Cobalt’s 2024 pentesting report found that GenAI vulnerabilities are less likely to be fixed than traditional bugs: just 21% of AI-specific issues were remediated, compared to 76% for APIs. The gap isn’t just technical, rather cultural, according to Cobalt.

Model inversion: When AI remembers too much

“Business velocity is outpacing security readiness,” said Gunter Ollmann, CTO of Cobalt in a SC Media article . “Organizations are deploying LLM-based apps quickly, but without the secure-by-design controls we’ve come to expect elsewhere.”

Even if a model isn’t connected to internal data sources, it can still leak sensitive info baked into its training data. That’s the basis of model inversion attacks, where researchers or adversaries repeatedly query a model to reconstruct its underlying training inputs.

A now-infamous case is the “Proof Pudding” attack (CVE-2019-20634). Researchers extracted specific emails used in training to bypass filters and access protected systems — effectively weaponizing the model’s memory.

OpenAI’s “GPTs” and leaking instructions

The concern is serious enough that the NSA issued guidance in 2024 warning federal agencies and contractors to assume that public models may leak information if trained on shared data.

Even custom-built GPTs created using consumer tools have been caught leaking sensitive information. In one analysis, researchers found that some GPTs would expose their own uploaded documents, instructions, and system prompts when probed correctly. These models weren’t acting maliciously. They were simply too helpful.

For businesses the risks are magnified, as Neal Ziring, the technical director of the National Security Agency’s (NSA’s) Cybersecurity Directorate, explained in a fireside chat with Billington Cybersecurity in 2022

“If you’re a government agency, you’ve put a lot of effort into training your model, perhaps you used highly sensitive data to train it,” Ziring said. “[A]n attacker might attempt to query your model in a mathematically guided fashion in order to extract facts about the model, its behavior or the data that was used to train it. If the data used to train it was highly sensitive, proprietary, nonpublic, you don’t want that to happen.”

Users are the weakest link and the easiest entry point

The point? Generative AI is code that follows strict instructions and doesn’t know what not to share. Hackers know this and use black box attack technique to reverse-engineer AI/ML models to pry the sensitive data used to train them, Ziring said.

Some LLM providers, use their customer’s input (prompts and conversations) to improve model performance and train future versions. This means that information you enter could be used to generate answers for others.

For that reason, not all data disclosures are the model’s fault. Employees and customers often paste sensitive content into prompt windows without realizing that their input may be stored, logged, reused by the provider, or extracted by other users.

What comes next

In the 2022 case of Samsung , employees unintentionally uploaded sensitive semiconductor plant source code while troubleshooting a bug, also when optimizing a test sequence for chip yields and using an LLM to transcribe a confidential company meeting.

Sensitive data disclosures don’t always show up in SIEM alerts or threat intel feeds. But they can be just as damaging as a breach. OWASP is urging AI builders and security teams to design with disclosure in mind.