Historically, a small tape library or even a single tape drive was placed in these offices and someone was trained to do the backups. These backup tapes were then put in a FedEx box and sent to the corporate headquarters (but were rarely integrated into the main data set). The hardware, people and the process involved proved to be weak links in an already fragile chain. The wrong tapes would be used or the same tape would be used over and over again. The people would change and retraining would be needed. The process of actually doing the data backup, getting the right tapes in the FedEx box and sent on time to the main data center wouldn't happen. If all the data made it to the main data center, typically it wasn't reintroduced to the primary backup set, so
Store a recovery set locally and manage it centrally
Replication of data out of remote sites isn't a new capability. The problem with these solutions is that they replicate all the data out, leaving no local copy of data available for recovery. In an environment where the remote office is connected via a WAN, recovery of even a moderately sized data set across that segment can be slow and problematic. Most customers consider the implementation of an edge protection strategy as a upgrade to the backup process. If that upgrade actually delivers a lower quality of service, in this case recovery of the remote sites data, it isn't going to be well received. This is particularly true when dealing with an entire server going down, where the entire server needs to be recovered.
With the typical WAN segment available to an edge-type of location, it's not possible to recover the entire server data set across that WAN. As a result, the server must be recovered at the main data center and then shipped to the remote office. To prevent these situations from occurring, it's ideal to have a local backup target that's managed from the central office. Disk is an ideal candidate for this because it doesn't require local handling like tape libraries, while at the same time maintaining cost efficiency.
Store data efficiently
To store this data locally on disk and to maintain the cost effectiveness of tape requires that the disk backup target employ data deduplication. Data deduplication is the process of identifying incoming data segments to segments already stored on the disk and only using storage once per redundant segment. Deduplication can be implemented using purpose-built appliances or with software. This allows the local disk to store at least a few weeks worth of backups for local recovery efforts. An efficiency of 20 times isn't uncommon in backup operations. With the ability to store several weeks worth of backups on a small disk array, tape-handling issues can be eliminated while increasing backup and recovery performance.
Optimize the use of WAN bandwidth
To complete the protection of the edge locations requires replicating the data off the local deduplicated disk array across the WAN segment and to the main data center. This replication leverages the local deduplication appliances to only replicate new segments across the WAN. The effect is a very efficient use of the WAN, reducing or eliminating the need to upgrade to more bandwidth.
The deduplication capability should also extend globally across sites. This leverages redundant segments throughout the enterprise. For example, if site A has been backed up and then replicated to the main data center, when site B backs itself up locally and is ready to replicate, only data that's unique to the entire enterprise should be replicated to the data center. If a similar file existed in the data replicated from site A, site B's data deduplication system should be aware of this and only send the unique segments of that file. A 5 MB slide PowerPoint presentation that was created in site A, backed up and replicated to the main data center, but also exists in site B should only be backed up to the local data deduplication system in site B, not replicated a second time across the WAN.
Part of optimal WAN bandwidth utilization for what might be a limited remote office connection is to have the ability to "throttle" the amount of data being sent. If during the day a large amount of net new data is introduced into the storage at one of the remote locations, that night's replication process will take significantly longer and may intrude on the use of the WAN during the day; VoIP for example. While this may be able to be addressed via QoS parameters on the WAN router, that can lead to contention over use of the WAN and which data is a higher priority. A simpler solution is to throttle down the deduplication system's use of the available network bandwidth during known busy times.
Integration to the data center backup
Protecting the edge should not be a standalone process that's outside of the standard backup process. IT personnel don't need additional processes to monitor and manage. Not only should the process be integrated but the actual data should be as well. In the classic tape-based approach, most often the tape backup media wasn't reintroduced into the tape library when it arrived at the main data center. When the need to recover occurred, it was a cumbersome time-consuming process. With a deduplicated and replicated strategy described above, re-ingesting the data into the main backup process is straightforward with a few extra steps.
Successful protection of the edge can best be achieved by having an integrated process that's managed at the primary data center. Recovery can be optimized by using a solution that can keep a backup set in the remote office and centrally. Cost can be optimized by using technology that can provide data reduction to efficiently utilize storage and WAN bandwidth. Integrating remote-office protection into the overall backup process reduces the time required by data center personnel to manage the solution and can eliminate vulnerabilities associated with managing separate solutions.
About the author: George Crump, founder of Storage Switzerland, is an independent storage analyst with over 25 years of experience in the storage industry.
This was first published in April 2008