Fires, floods, and natural and human disasters, as well as tightening regulatory compliance requirements, have caused storage professionals to seriously rethink the way that data is protected in the enterprise. Backups are routinely replicated to remote locations across the Internet and other dedicated WAN connections. Remote backups offer the speed and integrity of disk-based backup, while providing physical safety in supporting facilities across town or across the continent. Many organizations even employ the services of third-party storage providers rather than support and maintain offsite facilities.
However, remote backups rely on WAN connectivity to ensure adequate speed and reliability, and WAN disruptions can easily render remote data unavailable. Compression, data deduplication, and delta differencing allow for smaller, faster remote backups within your available bandwidth. This chapter of the guide offers a series of best practices that can help avoid common implementation mistakes and get the most from remote backup technology.
Implement data reduction technologies
To achieve optimum WAN efficiency, it's important to implement the broadest suite of available data reduction technologies. Use conventional compression and data deduplication to eliminate redundant files, blocks or bytes from the backup set. Conventional compression typically provides 2-to-1 data reduction, while data deduplication can achieve anywhere from 30-to-1 to 50-to-1. Delta differencing simply sends any changed data since the last backup, so if only 10 GB have changed since the last backup, only the 10 GB would need to be sent. Finally, consider your file sets carefully and eliminate any unnecessary or extraneous files, such as user MP3 files, from the backup set.
Implement encryption technologies for data security
The most effective means of protecting backup data is through encryption. Encryption should be implemented on the server side prior to transmission across the WAN (encrypting data in flight). However, encrypted data cannot be compressed or deduplicated once it is finally stored on the remote end (at rest), so it's important to apply any compression or deduplication first and then encrypt the backup data before transmitting it to its remote location. Encryption also obligates storage administrators to maintain and protect the keys. If keys are lost or forgotten, any data encrypted with the key becomes inaccessible. This can sometimes present problems when changing the key, because any backups encrypted with the previous key would also become inaccessible -- possibly compromising retention needs.
Determine a suitable bandwidth
Bandwidth can be an expensive commodity, so most organizations will select a bandwidth level that will meet the available backup window or recovery time objective (RTO). Bandwidth needs are roughly determined by dividing the total size of the compressed (processed with conventional compression and data deduplication) and encrypted backup set by the maximum allowable backup window or RTO. For example, a deduplicated and encrypted backup set of 100 GB can be sent to a remote location in about 4 hours with a bandwidth of approximately 56 Mbps -- slightly faster than the rated speed of an 802.11g wireless LAN. Of course, that assumes 100% bandwidth utilization without overhead, so always allow for adequate bandwidth headroom to support other communication and eventual growth of the backup set over time. Speed estimates also assume that the backup server can process the backup at a suitable speed. Large companies with significant bandwidth requirements will opt for carrier-grade connectivity such as T1 and OC3 or higher.
Optimize the use of available bandwidth
Most organizations do not buy extra bandwidth for backup purposes, so remote backups often occur along with other communication tasks that can potentially extend the backup window. One solution to limited bandwidth is scheduling -- holding the backup until off-peak hours when the backup process can monopolize the available bandwidth. Another technique to consider is bandwidth throttling through quality-of-service (QoS) controls, giving bandwidth priority to backup jobs when they occur, and leaving some portion of bandwidth available for other tasks.
Plan for WAN interruptions
WANs are often a complex hodgepodge of routers, switches, and other sophisticated networking equipment deployed by ISPs and backbone providers. Periodic equipment faults, downtime for upgrades and other uncontrollable issues may cause disruptions in your WAN bandwidth or cut off your connectivity entirely. When disruptions occur, it may be impossible to create a remote backup or recover a remote backup set. When planning remote backups, be sure to consider the impact of WAN disruptions. In many cases, an organization will keep a copy of the latest backup set on the local backup server. If data loss occurs during a WAN outage, data can still be recovered from the local copy, and the local copy can be transmitted to remote storage once normal WAN service is restored. Also ensure that your remote backup scheme can pick up where it left off after a WAN disruption and complete the backup cycle properly.
Pay attention to data protection on the remote side
Don't forget that remote backups must also be protected. In some cases, disk-based backups are periodically transferred to a tape library in the same remote facility (D2D2T). Tape copies can usually be made without any impact on the production data at all. In other cases, users rely on disk-oriented protection schemes like RAID 5 or RAID 6 which is even more prominent in high-capacity SATA disk systems. Be sure that the virtual tape library (VTL) or other disk storage system is using an appropriate RAID level, and try to implement time-saving RAID features like pre-emptive disk rebuilds.
Determine the appropriate retention and deletion policies
Understand the retention requirements for your remote backups and ensure that there is sufficient storage available to retain your backup sets for the requisite period. For example, if you create weekly backups, and each backup must be retained for two years, you'll need enough remote storage to hold at least 104 backup sets. Next, have a procedure in place to delete the old backups as they age out of the storage system. Deletion must also meet the standards set forth by regulatory guidelines and industry standards. Tools like policy managers may be able to integrate with the backup platform and track retention for you. Organizations will eventually offload their oldest backup sets to tape or another electronic vault. This reduces the amount of disk storage needed for long-term storage.
Implement periodic recovery drills
While backups can be automated, restorations can not, and administrators should schedule periodic recovery drills to keep IT staff trained properly. In many cases, recovery drills are conducted on a test server so that production storage is not affected.