Information in the form of data represents a significant portion of valuable corporate assets. Because data ultimately resides on disk arrays and tape, the safeguarding of storage is a key component of business continuity and disaster recovery. As with tape backup projects, though, working on a disaster recovery strategy is a kind of purgatory — not quite hell, but a far cry from heaven.
Aside from the technical challenges, no one likes to dwell on all the bad things that can happen to human and IT corporate assets, or imagine all the steps required to recover essential data and resume operations during a disruption. Most of all, people do not want to consider what would happen if enterprises simply collapsed due to nonexistent, inadequate, or untested DR solutions.
Despite decades of procrastination on disaster recovery projects, companies are finally beginning to recognize the exposure they bear if they do not have a well-conceived and implemented disaster recovery plan for storage. Fortunately, recent technical developments on the DR front have resulted in more flexible and economical solutions that enable customers to implement high availability storage strategies even under the constraints of tight budgets. Compared to legacy DR technologies, these new solutions provide significant savings in both hardware and communications costs, and offer more flexibility to provide different levels of data redundancy that meet the requirements of diverse business applications.
Sizing the DR Tactic to Business Requirements
While all corporate data has some value, not all of this information is essential to the immediate resumption of business in case of disaster or outage. One of the first steps in implementing an effective DR strategy, then, is to prioritize data and match data types to levels of recovery.
Online transaction processing, for example, may need a current and full copy of data available in the event of disruption. This requirement is generally met through synchronous disk-to-disk data replication over a suitably safe distance. In contrast, for application development code, it may be sufficient to have tape backups available, with restoration to disk within 2-3 days time. Sizing the DR tactic to business requirements helps keep DR costs under control while streamlining a recovery process.
The hierarchy of data availability begins with solutions based on server clustering and synchronous disk data replication. Geoclustering, or the ability to have servers in multiple locations assume the tasks of failed servers, ensures non-disruptive access between network clients and server assets.
Synchronous disk-to-disk data replication ensures that the backup copy of data is current to the most recent transaction (write to disk). The combination of clustering and synchronous replication enables transparent failover from a production site to a backup site, regardless of whether individual server or storage devices fail.
Synchronous vs. Asynchronous Data Replication
Synchronous data replication, however, is sensitive to latency, and it is sometimes difficult to extend the distance between primary and DR sites beyond approximately 100 miles. To span regional, national, or international distances, asynchronous data replication is more suitable. Asynchronous data replication accumulates multiple transactions before processing an update to the secondary disk array, and so is less sensitive to latency.
This enables the distance between primary and DR sites to stretch over hundreds or thousands of miles, well beyond the circumference of potential disruption. The tradeoff for distance is the fact that, in the event of disaster, the backup site may not be fully synchronized with the production site and a few business transactions may be lost.
Synchronous and asynchronous disk-to-disk data copying offers the most efficient means to implement high availability replication. Typically, the disk-based synchronous and asynchronous solutions are closely integrated into the disk vendor’s architecture and so can leverage other advanced features offered by the vendor’s array.
There is also less data handling compared to other methods such as host- or file-based data copying or replication appliances. All major platform providers offer their own flavors of disk-based replication – e.g. Symmetrix Remote Data Facility (SRDF) for EMC Symmetrix; MirrorView or SAN Copy for EMC CLARiiON; TrueCopy for HDS; Data Replication Manager (DRM) for HP; Remote Volume Mirror (RVM) for LSI Logic, IBM, and StorageTek; and REDI SANlinks for XIOtech.
For heterogeneous storage environments, host- or appliance-based data replication provides data movement between the primary storage of one vendor and the secondary storage of another. The host- and appliance-based replication solutions vary widely in implementation, but typically involve multiple data copy processes and may convert block data to files for transport over IP networks. While not as straightforward as disk-to-disk data replication, these solutions do provide a means to accommodate mixed storage environments or replication between enterprise-class storage and lower cost JBODs.
For less mission-critical data, traditional tape backup at least provides a means to restore data if the primary array is lost. Restoring from tape, however, is time consuming and without periodic testing, it may not be possible to verify the integrity of tape cartridges. A bare metal restore to a new disk array is not an optimum choice for most administrators, but is a means to leverage existing tape archiving routines for data recovery.
Some customers combine disk-to-disk replication and tape backup methods, using the secondary disk array as a spooler for tape backup. That makes it possible to move aged data to tape without impacting production storage.
Data Replication via Storage over IP
Both disk-to-disk and disk-to-tape data copying may be performed over longer distances using new storage over IP technologies. The Fibre Channel over IP (FCIP) protocol tunnels data replication traffic within a wide area network link, while the Internet Fibre Channel Protocol (iFCP) provides a means to route storage data in native IP format. The iFCP protocol also provides fault isolation between connected SANs, further reinforcing the stability customers require for DR strategies.
Optimized buffering, data compression, support for jumbo frames, and techniques such as Nishan Systems’ Fast Write algorithm enable full utilization of available WAN bandwidth and allow for the extension of disaster recovery across hundreds or thousands of miles. The bottom line is that the 10 kilometer limit previously imposed by native Fibre Channel extension is no longer a restriction for today’s disaster recovery planning.
In addition, the use of more affordable and available IP network services for DR brings business continuity within the reach of medium and small businesses. Converting Fibre Channel traffic into IP enables customers to leverage a wider variety of IP solutions for DR strategies.
Steinbach Credit Union in Canada, for example, is using wireless LAN technology to replicate storage data between its primary production facility in Steinbach, Manitoba, and a secondary facility in Winnipeg. Using a combination of Proxim wireless bridges, XIOtech storage, and Nishan IP storage switches, this innovative solution avoids the monthly recurring costs of leased fiber optic services and still accomplishes the primary aim of high availability data access. Other customers are leveraging shared IP network links to support both disaster recovery and messaging traffic between multiple sites.
Currently available disk-to-disk data replication, remote server clustering, and IP storage switch products are making it easier for customers to design and implement DR strategies. As with 12-step programs, however, the key to disaster recovery is to take the first step. With management recognition of the necessity of DR support and funding, and due diligence in selecting robust products from current offerings, SAN architects and administrators can create recovery strategies that are tightly integrated with day-to-day storage operations and that can ensure business continuance in spite of unexpected disruptions.