A major premise of appsec is figuring out effective ways to answer the question, "What security flaws are in this code?" The nature of the question doesn't really change depending on who or what wrote the code. In other words, LLMs writing code really just means there's mode code to secure. So, what about using LLMs to find security flaws? Just how effective and efficient are they?
We talk with Adrian Sanabria and John Kinsella about the latest appsec articles that show a range of results from finding memory corruption bugs in open source software to spending an inordinate amount of manual effort validating persuasive, but ultimately incorrect, security findings from an LLM.
Security Weekly listeners save $100 on their RSAC 2026 All Access Pass! RSAC 2026 Conference will take place March 23rd to March 26th in San Francisco. To register using our discount code, please visit securityweekly.com/rsac26 and use the code 56U5SECWEEKLY! We hope to see you there!
Most security conferences talk about threats. Zero Trust World lets you attack them. From March 4th to 6th, 2026 in Orlando, Florida, this hands-on cybersecurity event features live hacking labs where you’ll break real environments, think like an adversary, and learn how attacks really work. You’ll also get expert sessions, real-world case studies, CPE credits, and networking with top practitioners. And yes — the Security Weekly team will be there too. Don’t miss it! Register today at securityweekly.com/ZTW.
Mike Shema
- Open Source security in spite of AI | daniel.haxx.se
- Vibe Coding Is Killing Open Source Software, Researchers Argue
Check out the research and the Tailwinds thread about the impact of LLMs on its project.
This topic was also recently covered by Redmonk.
- ClawdBot Skills Just Ganked Your Crypto | OpenSource Malware Blog
There were almost as many articles about this as there were malicious skills. See also:
- ClawHavoc: 341 Malicious Clawed Skills Found by the Bot They Were Targeting
- Helpful Skills or Hidden Payloads? Bitdefender Labs Dives Deep into the OpenClaw Malicious Skill Trap
- Snyk Finds Prompt Injection in 36%, 1467 Malicious Payloads in a ToxicSkills Study of Agent Skills Supply Chain Compromise
- Evaluating and mitigating the growing risk of LLM-discovered 0-days
The premise of this blog post is the promise of LLM-driven security analysis of code. It shares three examples that are more impressive than just the results of an LLM running grep. What will be interesting is seeing how this repeats for vulns that aren't memory corruption. It would also be informative to know how much human time was spent on validating the findings, creating patches, and editing the write-ups.
Regardless of the desire for more details, this effort improved security for 500 open source projects with an efficient use of the project owners' time -- I'd consider that a success.
- Auditing Outline. Firsthand lessons from comparing manual testing and AI security platforms · Doyensec’s Blog
- Tales from the Trace: How XBOW reasons its way into finding IDORs
We spoke with Nico Waisman about XBow's approach to agent-driven pentesting back in episode 351.
- Detecting backdoored language models at scale | Microsoft Security Blog
Read the research here.







