How to create a zero-trust framework in an LLM world

COMMENTARY: Now that large language models (LLMs) are ubiquitous in the workplace, the model context protocol (MCP) has emerged as an indispensable component of these artificial intelligence (AI) assistants.

Simply put, MCP lets LLMs interact with external tools, application programming interfaces (APIs) and data sources to rapidly advance seamless, scalable integration in software development, data analysis, and additional operations.

[SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Read more Perspectives here.]

MCP acts as a universal translation layer so that LLMs can interact with external systems in an open and repeatable way, with the model bypassing any attempt to understand what’s being connected. In software engineering and development, this hyper-integration helps automate tasks such as code generation/review, static analysis, and documentation lookup. In day-to-day workflows, it plays an important role in user onboarding, resource provisioning, email summarizing and event scheduling, to cite just a few examples.

However, MCP acts as a double-edged sword. It unlocks immense integration powers for LLMs, but it also quietly extends the attack surface in previously unforeseen ways by embedding trust in products, registries and servers, giving cyber criminals multiple paths to exploit the new flexibility and, frankly, gullibility. MCP products might act like eager attendees at a sales convention who buy into everything on display, regardless of whether they’re legit or not.

In the process, adversaries no longer have to compromise the model itself. Instead, they exploit the context construction layers that the MCP accumulates – the metadata, descriptions and flow of tool responses which LLMs take for granted as “truths” to be trusted.

Exploits now taking hold

With these tactics in mind, we’re seeing the following attacks taking hold, which traditional security frameworks cannot directly address:

Prompt injection via tool definitions: MCP servers offer tools to LLMs, along with descriptions of their functionalities. These descriptions get embedded within the LLM’s system prompt, so it can determine which tools to use and how to use them. Cyber criminals seize opportunity here with malicious servers that will “pre-poison” the LLM by registering a tool that secretly directs the model to leak or manipulate sensitive data.

Cross-server tool shadowing: This occurs when multiple MCP servers flow together contextually to form one big memory pool. The servers—which could allow the execution of everyday functions like company email, data storage and third-party services—do not undergo enforced isolation. Subsequently, a tool from a malicious MCP server can embed hidden, ill-intended instructions which poison all of the servers in the entire contextual memory pool.

Rug pulls: Adversaries compromise registries or update mechanisms to replace trusted MCP tools with malicious versions which alter definitions without notifying users. This will lead to the leaking of API keys or execution of harmful actions without detection.

ANSI escape code injection: ANSI escape codes allow for simplistic actions, such as text color changes or cursor movements. Adversaries leverage these codes to hide malicious instructions—such as the launching of a supply-chain attack—in MCP tool descriptions and outputs, making the instructions invisible on the screen but processable by AI models.

Actionable steps for zero-trust defense

All of this tells us that tools and servers are defining the models and their capacity for risk, regardless of user input. An effective defense strategy will incorporate zero-trust principles for tool definitions, cryptographic verification, behavioral monitoring and the strict isolation of MCP servers. Here are five ways to consider achieving these goals:

Enforce registry provenance: Only deploy tools that are published via vetted registries which require signing and prevent definition tampering. The signing ensures tool definitions are authentic by verifying the digital signature of the provider.

Implement guardrails: To reduce prompt injections which are intentionally buried under descriptions, teams need to architect MCPs to enforce guardrails at the input and output stages of LLM workflows. This includes the running of inputs through prompt-injection classifiers; the stripping or quarantining of suspicious text patterns; and deploying of machine-learning (ML) filters.

Establish isolation in layers: When teams partition tool metadata according to individual servers, they can load them only when needed instead of merging everything into one context. As part of isolation efforts, cross-server context guards will refresh context to exclude unrelated server descriptions when calling up a sensitive tool, such as those supporting HR and finance departments. For departments or workflows that require additional defense, we can run MCP servers within separated client instances/workspaces, so that “toy” servers (like weather or “joke” APIs) can’t shadow them.

Restrict permissions: As part of a comprehensive zero-trust framework, restrict tools to only the minimum necessary privileges to do their jobs by rejecting over-reaching permissions. In addition, security teams should log and periodically review every granted permission, to flag for investigation any unused or anomalous permission request. It’s best practice for de-permissioning to happen with as much regularity and review as permissioning, so that access privileges do not accumulate over time.

Apply risk scoring to MCP servers: This will let teams quickly assess and prioritize which ones pose the most potential for harm.

The road to protected MCP integrations begins with an understanding of how AI assistants work within their cyber ecosystem. An LLM doesn’t simply execute a user prompt: it will obey instructions from every input that’s injected into a context construction layer.

By investing in a multi-step, zero-trust-driven defense plan—one which includes registry provenance, guardrails, layered isolation, restricted permissions and risk-scoring—security teams will ensure that LLM hyper-integration will greatly enhance enterprise goals rather than hinder them.

Gianpietro Cutolo, cloud threat researcher, Netskope

SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Each contribution has a goal of bringing a unique voice to important cybersecurity topics. Content strives to be of the highest quality, objective and non-commercial.