How the principles of disaster recovery can guide us with cyber resilience

COMMENTARY: Cyber resilience often gets treated as a new challenge driven by modern threats. It isn’t. We solved a version of this problem decades ago in disaster recovery. The lessons still apply.

Early in my career, I worked with IBM mainframe customers running high-availability systems. Downtime wasn’t acceptable. Failure was assumed. So they practiced for it.

[SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Read more Perspectives here.]

They intentionally failed over from one data center to another every month or every quarter. Not because something was broken, but because they needed proof that recovery worked. Repetition built confidence. Teams didn’t debate during an outage. They executed and they trained like they fought.

That discipline created muscle memory. Every failure mode. Every dependency. Every time. Resilience wasn’t a policy document. It was an operational habit built through repetition and accountability.

How cybersecurity drifted away from that DR model

Today, cyber resilience often gets framed as a tooling, compliance, or reporting problem. New frameworks emerge. Dashboards improve. Metrics multiply. But none of that guarantees performance when systems fail under pressure.

In reality, cyber resilience represents the same operational challenge disaster recovery faced years ago, just triggered differently. Systems fail. Attackers exploit weaknesses. The business still needs to recover. Customers still expect availability. Regulators still expect accountability.

The most resilient organizations accept that reality. They assume something will go wrong. They look for weak points before attackers do. They rehearse response and recovery until execution becomes routine.

Others rely more heavily on assumptions.

I’ve seen the fragility of that approach. During my time at GE Capital, a major outage exposed recovery plans that looked solid on paper, but failed in practice. Backups existed. Processes existed. Documentation existed. What hadn’t happened was regular end-to-end testing.

When systems went down, teams learned too late that backups were corrupted and dependencies weren’t fully understood. The failure wasn’t caused by a lack of technology. It was caused by a lack of rehearsal.

Cyber incidents expose the same weakness, often faster and with greater consequences.

When a server crashes, the cause isn’t immediately clear. It could be an operational issue, or malicious activity. Until the team knows for sure, they have to treat it as both. Service restoration can’t wait for perfect attribution.

In those moments, disaster recovery and cybersecurity converge. Restore operations first. Investigate second. That requires teams that have rehearsed together, not siloed plans that have never been exercised under pressure.

It’s rarely a tooling problem. Rather, it’s a process problem -- and a leadership problem.

Disaster recovery has a simple rule: a backup isn’t a backup until it’s been restored. Cybersecurity often ignores the equivalent principle.

An annual penetration test offers limited assurance in an environment that changes constantly. Patches land weekly. Configurations drift. New services appear. Cloud and identity layers evolve. Risk changes faster than annual cycles can capture.

Without regular validation, leaders make decisions with incomplete information. They may believe risk gets managed when it has quietly shifted.

When I was accountable for security outcomes, the expectation was consistent testing tied directly to change. Patch cycles were followed by validation. Results were reviewed regularly. Remediation progress was tracked over time. The goal wasn’t perfection. It was visibility, learning, and improvement.

That cadence matters because attackers don’t operate on annual schedules. They adapt continuously. Defenders need feedback loops that move at a comparable pace.

It’s what an "assume-breach" mindset looks like in practice. It isn’t pessimistic. It’s realistic. Incidents happen. What differentiates resilient organizations is how prepared they are to respond when they do.

That preparation doesn’t happen during a crisis. It’s built beforehand through repetition and leadership commitment.

Leadership sets the tone

Resilience reflects what organizations choose to rehearse and measure. Regular drills. Clear ownership. A bias toward verification over assumption. It’s not easy, especially in large, complex environments. But complexity doesn’t remove the obligation to test.

From my time in the military, one lesson carries over directly: Under pressure, teams don’t rise to expectations. They fall back on training. Preparation determines performance.

Cybersecurity has entered a period in which speed will matter even more. AI-driven attacks will compress timelines. Humans will increasingly manage by exception. Organizations without muscle memory will have little margin for error.

The path forward isn’t theoretical. Disaster recovery already showed us what works:

Intentional testing. Frequent rehearsal. Verification, instead of hope.

Cyber resilience doesn’t require reinvention. It requires discipline.

Snehal Antani, co-founder and CEO, Horizon3.ai

SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Each contribution has a goal of bringing a unique voice to important cybersecurity topics. Content strives to be of the highest quality, objective and non-commercial.

How the principles of disaster recovery can guide us with cyber resilience

How cybersecurity drifted away from that DR model

Leadership sets the tone

Related

The enduring mystery of hacker Phineas Fisher

Botnets powered by residential proxy networks are growing

Is cybersecurity already full of AI slop?

Related Events

Stay ahead in the SOC: Contain threats with confidence and control

AI for better SecOps: A Black Hat preview

Get daily email updates