OpenAI battles persistent prompt injection attacks on Atlas AI browser

December 23, 2025

(Credit: sdx15 – stock.adobe.com)

OpenAI acknowledges that prompt injection attacks, which trick AI agents into executing malicious commands, remain a significant security challenge for its Atlas AI browser. The company is implementing new defenses to combat these evolving threats, with further coverage provided by TechCrunch.

Prompt injection, a type of cyberattack that manipulates AI agents through hidden instructions in web pages or emails, is unlikely to be completely eradicated, according to OpenAI. The company's Atlas AI browser, launched in October, has faced scrutiny from security researchers demonstrating vulnerabilities. OpenAI is now employing an LLM-based automated attacker, trained via reinforcement learning, to simulate and identify novel attack strategies before they can be exploited in the wild. This internal "hacker bot" tests potential exploits in a simulated environment, analyzing the target AI's responses to discover flaws more rapidly than external attackers might. This proactive approach aims to continuously strengthen defenses against sophisticated, long-horizon harmful workflows, as demonstrated by a simulated attack where an AI agent was tricked into sending a resignation email instead of an out-of-office reply.

The ongoing struggle against prompt injection highlights the inherent security risks associated with AI agents operating on the open web. Experts suggest that while continuous defense strengthening and layered security are crucial, the balance between AI autonomy and access to sensitive data remains a critical concern. For agentic browsers, the current risk profile may outweigh the immediate value for many everyday users, necessitating careful consideration of user-provided instructions and confirmation protocols to mitigate potential data breaches and unauthorized actions.

Source: TechCrunch