Incorrect links output by LLMs could lead to phishing, researchers say

A third of login links for major brands output by OpenAI’s GPT-4.1 were inaccurate in a test by Netcraft researchers, highlighting phishing risks.

Perplexity also provided a phishing link as its first source when asked for the Wells Fargo login page, Netcraft said in a blog post Tuesday.

When large language models (LLMs) output brand URLs that are not controlled by the brands users are seeking information about, it could open the door for threat actors to register inactive and hallucinated domains, the researchers explained.

“As long as users trust AI-provided links, attackers gain a powerful vector to harvest credentials or distribute malware at scale. Without guardrails enforcing URL correctness, AI responses can mislead users,” Gal Moyal, CTO Office, Noma Security told SC Media in an email.

Such an attack could work similarly to search engine optimization (SEO) poisoning, where attackers use SEO techniques to push malicious content to the top of the results from search engines like Google.

“Phishers and cybercriminals are well-versed in traditional SEO techniques. But now they’re turning their attention to AI-optimized content, pages designed to rank not in Google’s algorithm, but in a chatbot’s language model,” the Netcraft researchers wrote.

Netcraft discovered another campaign designed to poison the training data or search functions of AI coding assistants involving dozens of fake GitHub repos created to boost the legitimacy of a fake Solana blockchain API.

Multiple malicious accounts, enriched with seemingly credible biographies, social media links and coding activity, promoted a project called Moonshot-Volume-Bot, which contains the fraudulent API. This project was also lent legitimacy through tutorial blogs and Q&As on forums.

At least five victims, some of whom appear to use AI-assisted code, included the malicious code in their own public projects, apparently without recognizing its malicious nature, Netcraft reported.

This further highlights the risks of AI tools leveraging unverified but seemingly legitimate sources, essentially “falling for” the social-engineering scams.

“Data integrity, data sourcing, cleansing, and verification are critical to ensure the safety and accuracy of LLM-generated outputs,” Darktrace Senior Vice President of Security & AI Strategy and Field CISO Nicole Carignan told SC Media in an email. “LLMs can and should have guardrails in place to mitigate the risk. One basic mitigation is to have LLMs ground or source any URL that is cited, essentially removing ‘generated’ hostnames and replacing them with grounded, accurate hostnames.”

Carignan said Netcraft’s research also highlights users’ misplaced trust in AI-generated outputs, noting that these outputs are not fact-based but based on semantic probabilities reliant on potentially inaccurate and untrusted sources.

Researchers have previously highlighted the risk of supply chain attacks via hallucinated packages in AI-generated code, which could be leveraged to inject malicious packages, in a theoretical technique coined “slopsquatting” by Python Software Foundation Developer in Residence Seth Larson.

And while proofs-of-concept have also shown AI models can be tricked into delivering malicious links via indirect prompt injection, training data poisoning and SEO poisoning do not require prompt engineering or jailbreaks to promote the distributions of harmful links.