OpenAI lays out its plan for major advances in AI cybersecurity features

OpenAI has released plans for a future of AI models with advanced cybersecurity capabilities, outlining plans to prevent misuse and empower cyber defenders.

In a blog post Wednesday, OpenAI said it will now treat all of its future models as though they could reach “High” cybersecurity capabilities under the company’s Preparedness Framework.

Under the framework, AI models with "High" cybersecurity capability could either help scale cyber operations through end-to-end automation or automate the discovery and exploitation of significant cyber vulnerabilities.

OpenAI said it's taking a “defense-in-depth” approach that aims to balance the potential for misuse with the potential for AI to aid cyber defenses, as defense and offensive cybersecurity operations are built on the same foundation, the company noted.

Rather than restrict models’ knowledge or rely on limiting general access to its models, OpenAI said it will use a combination of training, monitoring and red teaming to curb misuse of its models by threat actors.

Its frontier models are being trained to refuse or “safely respond” to requests that could enable cyber abuse while remaining helpful to researchers and defenders, according to the company. Systemwide monitoring has also been implemented throughout products that use its models to detect malicious activity.

“When activity appears unsafe, we may block output, route prompts to safer or less capable models, or escalate for enforcement,” the company stated.

With regard to enforcement, OpenAI said it uses both automated and human review, taking into account severity, repeat behavior and legal obligations. The company has previously published reports detailing misuse of its models by threat groups, including state-sponsored threat actors, noting that the associated accounts were banned from its platform.

OpenAI said it will work with red teaming organizations to help it identify gaps in its systems that could be exploited by well-resourced adversaries.

The company cited improvements in its models’ performance in capture-the-flag challenges as a sign of AI’s advancing capability, with GPT-5 achieving a 27% success rate in August 2025 and GPT-5.1-Code-Max achieving 76% in November 2025.

However, Allan Liska, a threat intelligence analyst at Recorded Future, told SC Media that while both AI capabilities and attacks aimed at circumventing AI guardrails are increasing, it is “important to not overhype the threats” now posed by AI models.

“While we have reported an uptick in interesting and capabilities of both nation-state and cyber criminal threat actors when it comes to AI usage, these threats do not exceed the ability of organizations following best security practices,” Liska said. “That may change in the future, however, at this moment it is more important than ever to understand the difference between ‘hype’ and reality when it comes to AI and other threats.”

OpenAI’s blog post also unveiled its plan to introduce a trusted access program that would offer tiered access to “enhanced capabilities” of its latest models only to qualifying customers working in cybersecurity to use in cyber defense operations.

Additionally, it plans to establish an advisory Frontier Risk Council of cybersecurity pros to collaborate with its teams and already works with other AI providers through the nonprofit Frontier Model Forum to share knowledge about best practices and existing threats around the weaponization of AI.

OpenAI’s announcement comes after Anthropic reported last month that its Claude Code AI service was used to automate a suspected China state-sponsored cyber espionage campaign targeting 30 organizations.