COMMENTARY: Over 14,000 Ollama server instances are publicly accessible on the internet right now. According to a recent Cisco analysis, 20% of these actively host models susceptible to unauthorized access. BankInfoSecurity separately reported discovering more than 10,000 Ollama servers with no authentication layer — the result of hurried AI deployments by developers under pressure to deliver.[SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts.Read more Perspectives here.]This is shadow IT reborn for the AI era — and it's happening faster than most security teams can track.
The visibility gap no one's talking about
When shadow IT first emerged as a concern, the threat was relatively straightforward: employees spinning up unauthorized SaaS applications or cloud instances. Security teams eventually developed tooling and processes to detect and govern these deployments. We adapted.Shadow AI presents a more insidious challenge. Developers aren't just subscribing to third-party AI services — they're deploying inference servers locally, often on workstations or internal servers that never appear in cloud asset inventories. A developer experimenting with Ollama on their laptop might inadvertently bind it to all network interfaces. A team testing LiteLLM as a unified gateway might deploy it without authentication. A data science group might spin up vLLM instances on GPU servers to accelerate research.
None of these scenarios involve malicious intent. All of them create security blind spots.The proliferation is staggering. Between Ollama, vLLM, LiteLLM, LocalAI, Hugging Face Text Generation Inference, LM Studio, and dozens of other platforms, the AI serving ecosystem has fragmented into a landscape that defies simple enumeration. Each platform has different API signatures, default ports, and response patterns. Each represents a potential entry point that traditional vulnerability scanners won't flag.
The question has changed
For security leaders, the strategic question has fundamentally shifted. A year ago, the relevant concern was "are we running AI?" Today, the urgent question is "where is AI running that we don't know about?"This isn't hypothetical risk. Unsecured LLM endpoints can expose proprietary training data, enable prompt injection attacks against internal systems, or serve as pivot points for lateral movement. An attacker who discovers an unauthenticated Ollama instance on your network can enumerate deployed models, extract system prompts, and potentially access whatever data those models were fine-tuned on.The challenge compounds because AI infrastructure doesn't behave like traditional IT assets. LLM servers often run on non-standard ports, respond to multiple API conventions simultaneously, and may proxy requests to other services. A single LiteLLM deployment might expose OpenAI-compatible, Anthropic-compatible, and custom endpoints — each requiring different detection logic.
Practical steps for security teams
Addressing shadow AI requires extending existing asset discovery practices while developing new capabilities specific to AI infrastructure. Here's where to start.First, update your threat model. If your organization employs developers — and especially data scientists or ML engineers — assume local LLM deployments exist. The convenience of tools like Ollama makes experimentation trivially easy. Your security posture should account for this reality.Second, extend port scanning to include AI-specific services. Ollama defaults to port 11434. vLLM typically runs on 8000. LM Studio uses 1234. Gradio interfaces often appear on 7860. These aren't ports that traditional vulnerability scanners prioritize.Third, develop fingerprinting capabilities for AI services. Detecting that a port is open isn't enough — you need to identify what's running. This requires probing service-specific endpoints and matching response patterns against known signatures.To help the community tackle this challenge, Praetorian has released Julius as open-source tooling under the Apache 2.0 license. It's a lightweight LLM service fingerprinting tool that detects 17+ AI platforms through active HTTP probing — answering the question "is this HTTP service an LLM?" during penetration tests and attack surface assessments. Julius is the first release in our "12 Caesars" initiative, a commitment to releasing one open-source security tool per week for 12 weeks.But tooling alone won't solve the problem. Organizations need policies that acknowledge AI experimentation while channeling it into sanctioned environments. Blanket prohibitions simply drive deployments further underground.
The clock is ticking
Shadow AI isn't a future concern — it's a present reality. Every week that passes without visibility into your AI infrastructure is a week where unknown LLM endpoints might be exposing sensitive data or accepting unauthenticated requests.The security community solved shadow IT through a combination of technology, policy, and cultural change. We'll need the same approach for shadow AI — but we need to move faster. The deployment velocity of AI infrastructure far exceeds what we saw with cloud services a decade ago.The question isn't whether your organization has shadow AI. The question is whether you've found it yet.
An In-Depth Guide to AI
Get essential knowledge and practical strategies to use AI to better your security program.
Evan Leleux is a software engineer at Praetorian Security focused on building scalable, distributed systems for enterprise security operations. He loves challenging problems and is always eager to learn. Evan is a Georgia Tech alumni.
Palo Alto Networks' Unit 42 has identified phantom squatting as a tactic where attackers purchase domains that LLMs hallucinate, meaning the AI models create web addresses that do not actually exist.
Get daily email updates
SC Media's daily must-read of the most current and pressing daily news