How humans will help manage the emergence of Guardian Agents

COMMENTARY: When it comes to AI adoption, the industry has shifted from GenAI and LLMs to the implementation of Agentic AI. It’s a shift from AI that thinks to AI that acts, either fully or semi-autonomously, and can carry out a wide range of tasks and use cases across nearly every industry sector or vertical.

We’ve even seen the emergence of the industry’s first comprehensive framework for secure Agentic AI adoption, in the OWASP Top 10 for Agentic Applications. This framework calls out various risks such as identity and privilege abuse, tool misuse and exploitation, and even Rogue Agents.

[SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Read more Perspectives here.]

But as the industry moves from deploying individual AI agents to orchestrating hundreds or thousands of AI agents poised to exponentially outnumber human users in their environments, it begs the question: Who watches the watchers?

The oversight arithmetic problem

Human-in-the-Loop (HITL) oversight has become an important way for teams to secure Agentic AI apps, especially when it comes to highly impactful decisions or actions involving sensitive data or business impact.

While metrics vary, everyone acknowledges that the number of agents will grow exponentially relative to human users. This presents a fundamental arithmetic problem. HITL simply does not scale, and humans quickly become the bottleneck.

The result: organizations have shifted away from HITL toward human-on-the-loop models, where humans monitor agent activity, define policies, and handle exceptions rather than approving every action. In some cases, organizations have moved toward human-out-of-the-loop systems that operate fully autonomously.

In either scenario, there’s still the need for oversight of agent actions in the enterprise. Especially given that agents invoke tools, take direct action, and interact with sensitive organizational data, internal systems and even systems beyond the enterprise boundaries.

For example, an attacker executes a prompt injection attack against an IT support agent, with the intention of exfiltrating service account credentials. The agent encodes the credentials into a diagnostic report and then sends them to an external email address. A SOC analyst, already buried in countless other alerts and suffering from HITL alert fatigue, rubber stamps the activity as noise. This incident goes unnoticed until the impact makes itself obvious or it’s caught in a future audit.

Attackers are already operating at machine speed. Defenders need to do the same. The battle between defenders and attackers won’t be won analog.

Enter the Guardian Agent

Although the “Guardian Agent” concept may seem novel, it builds on a longstanding practice of using AI to secure AI.

In the prior wave of model-centric security, a practice known as LLM-as-a-Judge emerged, in which AI evaluated AI outputs. As the industry moves from securing model outputs to governing agent actions, this approach evolves into Agents-as-a-Judge, where agents evaluate Agentic AI workflows and actions. These systems are collectively referred to as Guardian Agents.

Gartner predicts that by 2028, the industry will mature from AI Guardrails to Guardian Agents. By 2030, these Guardian Agents will capture about 10% of the entire Agentic AI market. This evolution will likely occur through multi-phased rollouts focused on ensuring agents produce the expected outputs, observing and monitoring agent processes and functioning as protective entities that can detect and disrupt potentially harmful or even malicious agent actions before they impact the business.

Using AI to secure AI creates other issues. Guardian Agents themselves rely on LLMs and agentic architectures and therefore may inherit many of the same vulnerabilities they are designed to mitigate.

Developers will have to design trust hierarchies between the Guardian Agents and the agents they monitor. This includes defining isolation requirements, privilege models and architecting guardians securely from the agents they monitor. In practice, this may require separate models, different access control constructs and even different incentive structures, knowing AI often wants to please the user and achieve desired outcomes. Much like separation of duties has been a foundational concept in security programs, this principle must now extend to all agents operating within the enterprise.

The recursive problem

Guardian Agents introduce another challenge: recursion. If guardians are required to monitor agents because of the sheer volume and risks, do we need guardians to monitor the guardians?

Every additional layer can add governance and oversight, but it also adds complexity, expands the attack surface and introduces latency and cascading failure risks. That said, concentrating human oversight at the guardian layer offers a practical way to reduce HITL bottlenecks while maintaining control.

Guardian Agents themselves now could become part of the attack surface. They are vulnerable to the same types of attacks as other agentic systems, including prompt injection and collision risks. As a result, organizations should account for Guardian Agents as part of a broader defense-in-depth approach. Organizations also need to account for failure modes, including deliberate evasion, misclassification and policy drift, to ensure that attackers or employees can’t silently bypass or manipulate the guardians.

Practical implementation

All of these concepts are interesting, but security leaders don’t deploy concepts. They deploy products, integrate platforms and look to operationalize capabilities.

Moving toward practical implementation requires several foundational elements. This includes a robust policy layer to help define what “bad” looks like, translating organizational policies, risk tolerance and compliance requirements into enforceable actions for guardian agents.

Guardian Agents must also integrate with core foundational aspects, such as IAM, SecOps integration, and visibility across agent deployment models, whether SaaS based or custom-built or Endpoint resident.

For security leaders and teams moving in this direction today, practical steps include inventorying enterprise agents, classifying agent risks, defining policies prior to enforcement, and ensuring continuous visibility. In many cases this may require evaluating purpose-built platforms capable of governing evolving Agentic AI environments as adoption accelerates.

Chris Hughes, vice president of Security Strategy, Zenity

SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Each contribution has a goal of bringing a unique voice to important cybersecurity topics. Content strives to be of the highest quality, objective and non-commercial.