Microsoft has introduced new tools within its Azure AI Studio aimed at strengthening AI model safety and security, according to The Register.
Prompt injection attacks could be better addressed with the Prompt Shields model, formerly known as Jailbreak Risk Detection, while the Groundedness Detection system enables improved detection of AI hallucinations through a custom language model verifying claims against source documents, noted Microsoft.
Microsoft has also unveiled AI-assisted safety evaluations and risks and safety monitoring features in AI Studio. While the new tools are valuable in evaluating AI model reliability, using AI for such systems could become a liability, noted University of Maryland's Vinu Sankar Sadasivan, who co-developed the BEAST attack against large language models.
"Though safety system messages have shown to be effective in some cases, existing attacks such as BEAST can adversarially attack AI models to jailbreak them in no time. While it is beneficial to implement defenses for AI systems, it's essential to remain cognizant of their potential drawbacks," said Sadasivan.
Such a development comes amid the introduction of new federal AI safeguards.