Building a Ransomware Resilient Architecture

December 2, 2022

eSecurity Planet content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

A user calls into the helpdesk reporting that their system is down. Upon investigation, you discover it’s ransomware. Servers are encrypted with “.locked” file extensions on files. Ransom notes are on the desktops.

No problem, just restore, right? You have the disaster recovery (DR) site, backups, and storage area network (SAN) snapshots. You should have at least three ways to recover.

As you try each one, that pit in your stomach grows as you experience the worst feeling in IT: the realization you have no backup for recovery. You thought you had SAN snapshots, your fastest recovery, but snapshotting and SAN replication have been turned off. You look for your cold replica in your DR site, but like your production servers, it has also been encrypted by ransomware. Your backups, the backup server, and all the backup storage — all encrypted by ransomware.

How could this have been prevented?

While security teams layer essential preventative measures, resilience measures also need to be implemented in an architecture to reduce the impact of ransomware attacks on your backups. Immutable backups, segmentation, protection of credentials, and protecting the administrator’s access to systems are critical for recovery.

Lastly, it is vital to safeguard your recovery environment. Chances are good that the attacker is still in your environment and can take down your recovery attempts.

So here’s what you need to be truly ready to recover from a ransomware attack — coming from a guy who’s had to confront these challenges.

1. Air-Gapped, Immutable Backups

As attackers grow more sophisticated, immutable backups become crucial. Immutable backups cannot be changed or overwritten. This is a new concept. Historically, most backups have overwritten the oldest backups in a rotation. Immutable backups allow appending, but each bit of data written to the backup is frozen, similar to write protection on an old cassette tape. Remember those?

Air-gapping means the backup is not live on your network. A network-attached storage (NAS) device or a SAN snapshot on your network is not air-gapped. Air-gapping prevents the attacker from deleting or corrupting your backups.

In every ransomware incident in the last three years, the attacker corrupted backups on the network. It’s important to note that disaster recovery (DR) sites are usually not air-gapped due to live VPN between production and the DR site. Over the years, I have seen many ransomware incidents where the DR site is also encrypted. Frequently, the DR site is the attacker’s entry point.

Enterprises have millions of dollars invested in DR sites. DR sites are effective in natural disasters but not for ransomware attacks unless other steps, such as segmentation, are in place to protect the disaster recovery environment.

most Backup solutions are not immutable out of the box

There are a lot of solutions now that claim to be immutable. Some are, but most solutions are not immutable out of the box. To ensure immutability, you must take additional steps to configure and architect your storage. Airiam’s AirGapd and Rubrik’s Data Protection backup solutions are well known for their immutability. Most other solutions claim to be immutable but depend on the target storage, like an Amazon S3 bucket, for immutability.

2. Segmentation

When an attacker gains access to your network, they first do reconnaissance to discover their next targets. Threat actors cannot hack what they cannot see.

To accomplish that, your IT team must implement segmentation between servers, storage, and backup environments using virtual local area networks (VLANs) and inspecting inter-VLAN traffic, treating that traffic as untrusted. All inter-VLAN traffic should go through a firewall. These steps minimize your attack surface by reducing what is visible to the threat actor.

This process goes against typical plans for most network administrators, who use firewalls at the network’s edge (Figure 1) and a fast switch on the LAN to route inter-VLAN traffic. This configuration allows the switches, which can move traffic much faster, to handle it. The downside is that there is usually no firewall analysis of the traffic unless it goes through a firewall, which means that an attacker on the network has free reign to move around.

The ideal situation, indicated in a simplified version in Figure 2, is for all traffic flowing between VLANs to travel through a firewall. Does this add latency? As long as the firewall is right-sized for your network traffic and tuned for the types of traffic, then the effect should be negligible. The problem is this: firewalls big enough to handle this traffic and speed are expensive. The ideal situation is putting LAN traffic through expensive firewalls. In larger organizations, this may be a fabric of firewalls rather than one firewall. Firewalls have limited throughput on their own.

Network administrators should implement the zero-trust idea of a segmentation gateway, which performs firewall inspection and adds Layer 7 policy-based segmentation and control based on user, device, and application access. For stability reasons and to save the firewall from all of the SAN storage traffic, the network administrator may still choose to keep storage traffic on a fast storage switch. Note that Figure 2 is very oversimplified. Most network administrators will follow the Purdue Model for ICS Security within the OT/IoT LAN. Thinking in terms of this segmentation gateway moves your organization toward a zero-trust architecture.

NDR and IDS vs. Firewalls and Segmentation Gateways

A middle ground for companies that cannot afford a fabric of fast, expensive firewalls and segmentation gateways is to implement network detection & response (NDR) and intrusion detection system (IDS) solutions.

By sending a network TAP (test access point) or SPAN (mirrored port) to an IDS or NDR device, suspicious traffic can be identified, triggering a response on your network. That response may be disabling a port, dropping a session, isolating a host, or blocking a user. These days, that automation is usually performed by an extended detection and response (XDR) or security orchestration, automation, and response (SOAR) system.

In addition to creating the segmentation into VLANs and adding security policies for the segmentation gateway, it is critical to lock down the access control lists (ACLs) between networks. No VLAN should have carte blanche access to another VLAN. In the case of Figure 2, I would prevent access to the Backup VLAN 204 from all other VLANs except the Server Hosts VLAN 206. In the instance of Veeam, I would allow only traffic between the Veeam backup proxies, Veeam backup server, VMware vCenter, and the backup storage. All other traffic should be blocked to the backup network.

Even if you have an old network architecture without next-gen anything, you should be able to implement basic segmentation with VLANs and access control lists (ACLs).

Read more about Firewall, Microsegmentation, NDR and IDS products

3. Protecting Authentication

If environments have backup servers joined to the domain, the vCenter setup for SSO against the domain, and the SAN storage set for LDAP authentication against the domain, the backups, hosts, and SAN are compromised every single time. Although central authentication for all IT resources makes IT management very simple, it also makes the attacker’s job easier. Compromising the domain allows the attacker to destroy every other IT resource, including the servers, storage, and backups.

Keep all infrastructure off Active Directory or other central authentication. Protect the credentials to these systems in a password manager or credential vault (such as Azure Key Vault or AWS Secrets Manager). This is equally important for storage keys and certificates. Amazon AWS S3 credentials and Azure Blob access keys need to be protected in a credential vault so the attacker can’t easily capture them and ruin your day.

If you are using encryption within your backup solution or your storage environment, storing a copy of your encryption key in your key vault will be your saving grace when access to all your other systems is down. Use vaults for all of your secrets. I have seen people unable to restore their backups because they didn’t store their backup encryption key in a secure cloud location. Likewise, keep your credentials for your cloud backup solutions in your key vault, not in some password file on your IT share. If the attacker captures the password for your cloud backup solution, they can disrupt or destroy your backups and even cancel your account. Consider your usernames and passwords as valuable as any other crown jewels on your network.

4. Administrative Workstations

Your administrators, whether your cloud admin, database administrator (DBA), or IT generalist, usually keep secrets on their workstations. I have often seen this in SSH profiles with keys for accessing VMs in Amazon AWS. A developer or cloud admin may have SSH keys on their workstations, which act as credentials.

Additionally, access to environments is often locked down to the administrator’s laptops or desktops. Everything may be firewalled, but access to the VMware environment may be locked down to the administrator workstation’s IP address. Attackers doing internal reconnaissance can identify the administrators on a network and confirm this with the administrators’ LinkedIn profiles. They know that if they hack Bob’s box, they’ll have access to the server and backup environments — and they do. This is how threat actors often gain access to your SAN and even your endpoint detection and response (EDR) or antivirus console. Your administrators have active, logged-in sessions. All the attacker has to do is open the browser and make whatever changes they choose.

The moral of the story is the administrator’s resources need to be protected just as thoroughly as your key executive’s resources. At the bare minimum, continuous logging, EDR, and multifactor authentication (MFA) must be in place to access those administrative machines. Ideally, there are zero-trust network access (ZTNA) solutions or security service edge (SSE) solutions forcing the administrator to re-authenticate anytime they attempt to access highly protected administrative resources. These resources are the weak link the threat actor looks for, and access to that administrative computer is often their goldmine.

5. It’s Not Over When It’s Over

The next incident happens, and you are well prepared to recover. Your backups have been protected, and you can restore. You bounce back quickly, getting your organization online in four hours, patting yourself on the back, and crowning yourself hero of the day.

But by evening, your machines are attacked again. What happened?

There are two steps you failed to do. First, you must protect your recovery. Second, you must understand how the initial attack happened so that you can learn from it and close the gaps and entry points. Even though you recovered, the attacker probably still has access to your network.

Here’s a checklist to take containment steps to evict the attacker effectively.

Implement EDR and active monitoring to identify command-and-control (C2) communications. EDR should also help catch any backdoors and remote access trojans (RATs) that the threat actor left behind.
Ensure your edge firewalls are the next-generation variety, with intrusion prevention subscriptions actively looking for and stopping inbound network attacks. These are two things anyone can double down on without understanding the nature of the attack.
Understand the vector of attack for crucial next steps. Do you need to shore up email security? Turn off VPN? Close open remote desktop (RDP) services? Turn off the website until you fix a critical vulnerability? Give yourself some assurance that the attacker’s door is not wide open.
If you’ve been breached, assume your credentials have been compromised. As a standard, our team resets all user, administrator, and system account passwords and cycle Active Directory’s “golden ticket” twice.
Force MFA on all interactive accounts. If a user or administrator logs into anything, an MFA prompt should challenge them. Hardening Active Directory using Center for Information Security (CIS) controls is a good last step. Tighten up password policies, SMB signing, certificate requirements, etc.
Recover to a quarantine environment. The attacker or the attacker’s automation can reinfect your recovered systems while you are restoring from backups. Restore to an isolated quarantine VLAN. You can change VLANs over to production once you are fully restored. Restoring into a dirty environment endangers your recovery and could lead to a vicious cycle of attack and recovery.

I have dozens of war stories to back up each concept discussed here and could probably write a book at some point. For now, it is important for me to help security and IT professionals everywhere to level up their resilience to these attacks. You will suffer from a ransomware attack in the near future if you haven’t already done so. The key will be your ability to bounce back.

Editor’s note: The author was a speaker at last month’s MITRE ResilienCyCon (see You Will Be Breached So Be Ready).

Art Ocain

Art Ocain, CISM, MCSE, VCP, CCNA, Airiam’s Field CIO / Field CISO, is a visionary leader and IT business strategist. He specializes in resilience engineering, cloud architecture, incident response, cloud strategy, virtualization, server and network administration and security, business continuity planning, disaster recovery, designing storage solutions, network design, web server management, email server management, web application development, database management, and project management. Prior to his current role, Art was President and COO of MePush, a cybersecurity and managed IT company acquired by Airiam in 2021. He holds an MBA from University of the People. He can be found at Airiam.com