'Hugging Face' AI models, customer data at risk to cross-tenant attacks

In an eye-opening piece of threat intelligence, the cloud-focused Wiz research team partnered with fast-growing AI-as-a-service provider Hugging Face to uncover flawed, malicious models using the "pickle format" that could put the data and artificial intelligence models of thousands of Hugging Face customers at risk.

An April 4 blog post by Wiz researchers said potential attackers can use the models developed by AI-as-a-service providers to perform cross-tenant attacks.

The Wiz researchers warned of a potentially devastating impact, as attackers could launch attacks on the millions of private AI models and apps stored by AI-as-a-service providers. Forbes reported that Hugging Face alone is used by 50,000 organizations to store models and data sets, including Microsoft and Google.

Hugging Face has stood out as the de facto open and collaborative platform for AI builders with a mission to democratize so-called “good” machine learning, say the Wiz researchers. It offers users the necessary infrastructure to host, train, and collaborate on AI model development within their teams. Hugging Face also serves as one of the most popular hubs where users can explore and use AI models developed by the AI community, discover and employ datasets, and experiment with demos.

In partnership with Hugging Face, the Wiz researchers found two critical risks present in Hugging Face’s environment that the researchers said they could have taken advantage of:

Shared inference infrastructure takeover risk: AI inference is the process of using an already-trained model to generate predictions for a given input. Wiz researchers said their team found that inference infrastructure often runs untrusted, potentially malicious models that use the “pickle” format. Wiz said a malicious, pickle-serialized model could contain a remote execution payload, potentially granting an attacker escalated privileges and cross-tenant access to other models.
Shared CI/CD takeover risk: Wiz researchers also pointed out that compiling malicious AI apps also represents a major risk as attackers can try to take over the CI/CD pipeline and launch a supply chain attack. The researchers said a malicious AI app could have done that after taking over a CI/CD cluster.

“This research demonstrates that utilizing untrusted AI models (especially Pickle-based ones) could result in serious security consequences,” wrote the Wiz researchers. “Furthermore, if you intend to let users utilize untrusted AI models in your environment, it’s extremely important to ensure that they are running in a sandboxed environment — since you could unknowingly be giving them the ability to execute arbitrary code on your infrastructure.”

While AI presents exciting opportunities, it also introduces novel attack vectors that traditional security products may need to catch up on, said Eric Schwake, director of cybersecurity strategy at Salt Security. Schwake said the very nature of AI models, with their complex algorithms and vast training datasets, makes them vulnerable to manipulation by attackers. Schwake added that AI is also a potential ‘black box’ which offers very little visibility into what goes on inside of it.

“Malicious actors can exploit these vulnerabilities to inject bias, poison data, or even steal intellectual property,” said Schwake. “Development and security teams need to build in controls for the potential uncertainty and increased risk caused by AI. This means the entire development process for applications and APIs should be rigorously evaluated from aspects such as data collection practices, deployment, and monitoring while in production. Taking steps ahead of time will be important to not only catch vulnerabilities early but also detect potential exploitation by threat actors. Educating developers and security teams about the ever-changing risk associated with AI is also critical.”

Narayana Pappu, chief executive officer at Zendata, said the biggest dangers here are biased outputs and data leakage: both have financial and brand risks for companies.

“There’s so much activity around AI that it's pretty much impossible to know – or be up-to-speed – on all of the risks,” said Pappus. “At the same time, companies can't sit on the sidelines and miss out on the benefits that AI platforms provide.”

Pappu outline five ways companies can more effectively manage AI security issues:

Have a robust a/b testing process and ramp-up AI systems slowly.
Create security zones with policies on what customer information gets exposed to AI systems.
Use privacy-by-design concepts using synthetic data instead of actual data, using techniques like differential privacy, tokenizing data.
Backtest AI models for bias on a continuous basis on the same data to monitor for differences in outputs.
Develop an established policy on how to remediate any issues that are identified.