Many large language models continue to lack vulnerability research and exploitation capabilities, with 48% and 55% of models failing the first and second VR tasks, respectively, while 66% and 93% failed the first and second exploit development tests, respectively, Infosecurity Magazine reports.