A roadmap for today’s vibe coding

COMMENTARY: I’ve spent the last few years building security tools for developers, so people are often surprised when I say this: I’m not against AI coding, even on vibes.

I use it all the time.

It works — often spectacularly well — because it lets me move quickly, explore, and ship useful improvements. The problem isn’t the technique. It’s the mental model engineers carry into it, typically some version of “AI wrote it, I skimmed it, it looks reasonable.”

[SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Read more Perspectives here.]

That’s fine right up until the code we’ve skimming has a much larger blast radius than we may have accounted for.

We can’t treat “AI helped me write this” as a substitute for understanding what we’re shipping, because the failure mode usually isn’t that the code doesn’t work — it’s that it works just enough to get merged while quietly expanding the blast radius to something we can’t reason about when it matters.

It’s a simple distinction: AI coding isn’t a technique, it’s a workflow, and workflows are only as safe as the verification wrapped around them.

If we ask an AI agent to build something from scratch, we aren’t reviewing “the code.” We’re reviewing an entire universe it just created – dependencies, framework choices, scaffolding, configuration defaults, implied architecture, boundary conditions, and the subtle assumptions that shape runtime behavior. Even when there’s clean output, we don’t really know what got pulled in, what got turned on by default, or what trade-offs the model quietly made.

That’s the difference between a controlled burn and a wildfire: one gets managed by constraints and verification, the other is powered by assumptions we haven’t named yet.

The workflow that works for me

Here’s what my development loop has looked like recently, and it’s different from the caricature of “prompt, paste, ship.”

First, I’ll use AI to build a quick prototype — something borderline disposable — to prove the idea works and exposes the unknowns. The point isn’t quality. It’s to answer the question: is this even the right shape of the solution? In practice that might mean a hacky endpoint, a sketchy UI, a crude integration, and a pile of assumptions, and I’m fine with that because I’m looking for clarity, not building a product.

Then I scrap the prototype.

Not because it’s "bad," but because prototypes are full of hidden decisions: random library choices, convenient defaults, inconsistent naming, odd architecture, and the kind of glue code that becomes permanent purely because nobody wants to touch it again. Keeping the prototype is how we end up with a system that looks functional while being fundamentally unreviewable.

But scrapping it doesn’t mean starting from nothing. I’ll still get AI to do a good chunk of the implementation — generate the boring scaffolding, write the repetitive parts, fill in the obvious plumbing — except now I’m doing it inside constraints I’ve chosen deliberately: the frameworks we already use, the patterns we already understand, and a structure that fits the real system instead of the model’s idea of a nice demo.

After that comes the part most people skip: I review and iterate like I would with a human teammate.

I add logging where future-me will need it, usually when something’s going wrong in production and I’m trying to find out what happened without guessing. I make failures legible, add the debugging hooks I wish every system had by default, and I put in metrics that answer the only questions that matter: how often, how slow, how expensive, and what changed.

Dropping in extensive debug logs is something AI does well. Less so for optimizing performance — that requires a human review with the AI making suggestions for where to improve things.

And then, crucially, I step through the code line-by-line until I can explain it.

As I do that, I add comments — real ones, the kind that explain intent. I’ll often have AI comment with me, but I treat it like pair programming because I’m writing the narrative of what this code is supposed to do. The output isn’t “documentation generated by AI.” It’s my understanding, made explicit, with AI helping me express it clearly.

The greenfield trap

Developers love the feeling of the greenfield. No legacy. No weird constraints. No "why is this like this?" moments. It's the natural habitat of vibe coding.

But greenfield generation is where blast radius becomes invisible.

When an AI generates a new service or app from scratch, the output often winds up being a bundle of decisions that we inherit, such as:

Which frameworks and libraries it chose.
How it handles auth, sessions, and state.
What defaults it accepts, and whether those defaults are safe.
How it logs, retries, and fails.
What it exposes over the network.
How it stores secrets and manages configurations.
What it does when inputs are malformed or adversarial.

Even if we read the code, we may not truly understand the behavior because it’s not just what the code says — it's in the dependencies, the runtime configuration, the transitive packages we didn't ask for, the parts we didn't notice were introduced.

That’s why "looks reasonable" represents such a dangerous standard. In security-critical paths, reasonable-looking code can still be wrong in ways that are hard to spot and easy to exploit.

So the standard in security isn’t “seems fine.” The standard is verification.

AI works great for wiring up the boring parts, but it’s not something we should trust to invent the security mechanism.

So, where's the line in February 2026?

My personal rubric looks like this:

Scaffolding around validated components. Yes.
Small, reviewable changes to existing code. Yes.
Implementation within constraints (tests + known SDKs). Yes.
Novel security implementations we can't verify. No.
Entire codebases from scratch. Only if we’re treating them as disposable.

This list may look very different later this year, but right now we’ll run into trouble when we treat high-stakes generation from scratch the same way we treat a small refactor. The workflow feels similar (prompt, paste, run) but the risk profile isn't comparable.

In 2026, more AI-coded applications are going to hit production, vibe coded or not. That's good for velocity because teams will build apps they wouldn't have attempted before.

But velocity doesn't reduce risk, it just changes where risk shows up.

If we normalize shipping large, security-sensitive codepaths that were generated from scratch, we will see some very public failures. Not because AI is "bad," but because teams will apply it with the wrong mental model.

The mental model we want: blast radius.

Here's how to proceed: Use AI aggressively where the team can verify it. Use it to glue together validated components. Use it to speed up the work the team already understands.

But be careful when the team asks AI to invent an entire new world, especially when that world includes security decisions we can't test.

AI coding isn't the problem. It’s our mindset.

Just stop treating a wildfire like a controlled burn. We can do this.

David Mytton, chief executive officer, Arcjet

SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Each contribution has a goal of bringing a unique voice to important cybersecurity topics. Content strives to be of the highest quality, objective and non-commercial.