Five ways to protect AI models

COMMENTARY: Artificial intelligence (AI) systems permeate almost every aspect of modern society. These technologies have deep integrations with business information systems that access valuable data such as customer details, financial reports, and healthcare records. AI systems can also access a variety of IT and OT systems such as cloud services, communication tools, IoT devices, and manufacturing processes.

[SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Read more Perspectives here.]

Because of their widespread adoption and deep accessibility, AI systems have become prime targets for malicious actors. Adversarial attacks on AI systems are rising. Let's delve into the methods cybercriminals use to attack AI:

Manipulate the prompt: Large language models (LLMs) such as ChatGPT are designed to generate outputs based on the inputs they receive. This makes them highly susceptible to a number of adversarial prompting techniques. If the model misunderstands the input or it’s unable to detect the security risk in the request, then it may accidentally generate a response that might compromise an organization’s security or override its own security parameters. For instance, researchers recently demonstrated that GPT-4 is vulnerable to multi-modal prompt injection attacks, where images embedded with malicious commands are uploaded to LLMs, prompting them to override their security filters and ethical protocols.

Target the response: LLMs are great at analyzing customer and financial data or application code, and that’s why it's natural for employees to feed them with raw business data sets for further training, analysis and predictions. These data sets can include sensitive and proprietary information such as hard-coded usernames, passwords, or payment card information. Attackers can ask the chatbot various questions and then use the findings to manipulate its model, identify its data sources, or discover its vulnerabilities. In 2023, Samsung faced a security breach after an employee shared sensitive source code with an AI.

Poison the data: Introducing corrupted or malicious data can compromise AI models. For instance, a Microsoft chatbot on X was shut down when it began generating racist tweets after being manipulated by malicious actors using inappropriate language, which it interpreted as standard dialogue. Another scenario involves threat actors strategically implanting websites, backlinks, keywords, user reviews, and online conversations. They then lead the models to retrieve information from these altered sources that are designed to generate inaccurate, biased, and malicious outputs, potentially directing users to phishing sites and pages.

Confuse the model: A minor change in input can make the AI model misbehave or mispredict. These changes are often something as small as a typo, however it can radically alter the model’s output data. For example, using the Fast Gradient Sign Method, adversaries can introduce a small noise or distortion in an image, leading to an incorrect output or prediction. Another malicious act is confusing autonomous vehicles with road marks to divert the car off the road or posting a false speed limit to cause an accident.

Abuse biases and hallucinations: AI systems are prone to several biases. An AI cybersecurity system might classify traffic from anti-Western countries like Iran and North Korea as malicious, while traffic from friendly regions like Japan and Brazil are tracked as safe. Attackers can exploit this bias by using a VPN to conceal or hide its traffic. Similarly, AI systems are prone to hallucinations that attackers can exploit.

Disrupt the infrastructure: AI systems reside on cloud computing servers. Using tools like Nmap and Shodan, bad actors can discover servers with open ports. These ports can let threat actors gain unauthorized access into AI infrastructure. Adversaries can also exploit zero-day vulnerabilities like ShadowRay in AI software, platforms, and infrastructure providers.

Compromise employees: Phishing still remains one of the biggest attack vectors for infiltration. Threat actors can phish employees, infiltrate the network, perform lateral movement, and gain access to a privileged user with access to the model. Next, they can exfiltrate the model by copying it to a server under their control.

How to protect AI models from LLM attacks

By taking the following five steps, organizations can prevent and mitigate LLM attacks:

Embrace user awareness and education: Ensure that employees are well aware of AI risks and weaknesses. Train them well so they don’t fall victim to phishing attacks or upload sensitive company data into AI models for analysis.

Develop an AI usage policy: Define ethical and responsible usage policy for AI within the organization. Offer clear instructions on what the company permits and does not permit. Identify and communicate risks associated with AI, such as data privacy, bias and misuse.

Leverage AI model and infrastructure security: Deploy advanced security tools to protect the AI infrastructure from DDoS and other cybersecurity threats. Use zero-trust principles and strict access controls. Limit access to AI models to specific privileged users.

Validate and sanitize inputs: Validate and sanitize all inputs must before they are passed to the LLM for processing. This step ensures protection against all major prompt injection attacks, ensuring that the model has been fed clean data.

Practice anonymization and minimization of data: Use masking or encryption techniques to anonymize data while training AI models. Minimize data use by only using data necessary for the company’s specific application.

AI has proven vulnerable to a range of threats that can compromise its integrity and security. Ultimately, it takes a collective commitment to security education, attentive monitoring, and vigilance to adequately harness its benefits while minimizing associated risks.

Stu Sjouwerman, founder and CEO, KnowBe4

SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Each contribution has a goal of bringing a unique voice to important cybersecurity topics. Content strives to be of the highest quality, objective and non-commercial.

Five ways to protect AI models

How to protect AI models from LLM attacks

Related

CIOs and CISOs need a common strategy around AI copilots

Malware distributed via fake DeepSeek ads on Google

AI-enabled phishing and fake worker attacks on the rise

Related Events

Microsoft Copilot Oversharing: A Framework for Assessing and Remediating

AI as a cybersecurity enabler and threat: The story so far

Get daily email updates