Published: 11 Mar 2011
The problem of properly backing up remote site servers and mobile computing devices has been with us a long time. But with a workforce that's getting more mobile, it's time to get a handle on remote backups.
Remote data centers and mobile users represent the last frontier of backup and recovery. And that frontier spirit is often reflected in the way many companies rein in backup and recovery of remote and mobile data. Remote data centers, as well as users of laptops or other mobile devices, are often left on their own to make do with inferior methods (or none at all), while the "big" data center enjoys a modern day backup and recovery environment. But with so much data being created and carried around outside the main data center, it's time for a change.
The root of the problem
Remote data centers often use standalone backup systems with limited connections to the corporate backup system. And because they typically deal with smaller data sets, remote centers often use less-expensive software and hardware. So, while the central data center may be running an enterprise-class backup product backing up to a large target data deduplication system or tape library, remote data centers often have workgroup-class backup products feeding backups to small autoloaders or even individual tape drives.
Likewise, the corporate data center is likely to have a contract with a media vaulting company to ensure that backups are taken off-site every day. Even better, the data center may be using a deduplication system that replicates backups off-site immediately. Remote data centers, on the other hand, often have backup systems that may go unmonitored, with backups that may end up in the backseat of someone's car if they leave the premises at all.
Mobile data backup is in even worse shape. Many companies don't have a policy for backing up mobile data at all other than instructing mobile users to copy important data to a file server. That's more about ignoring the problem than having a viable backup policy in place.
The typical mobile computer user simply doesn't think about backing up their data on a regular basis. And requiring mobile users to synchronize their important data to a file server also ignores one basic fact -- they're mobile and there's a good chance they don't have the bandwidth to synchronize large files or lots of smaller files.
Given the increased mobility of today's workforce, a significant amount of what your company considers its intellectual property may reside solely on unprotected remote devices.
Planting the backup seed
The first backup from a remote computer, referred to as the “seed,” must be taken into consideration when designing your backup plan for remote data. Unless you’re backing up extremely small amounts of data (a few gigabytes), you need to figure out a way to transfer the seed to your central site. Typically, this is done by backing up to a portable device of some sort that’s then physically transferred to the central site and copied to the backup server. Make sure to discuss the options your backup vendor can offer in this area.
Why mobile backup is so hard
Unfortunately, there are reasons why remote and mobile backup data sets have typically been handled so haphazardly. It's important to understand these reasons before attempting to fix the problem.
The main reason why both remote and mobile data sets aren't treated the same way as data in the corporate data center is the most obvious one: because they're not in the corporate data center. Slow connections between remote sites or users and the central data facility mean the remote systems can't use the same backup software used in the data center. Those backup applications expect quick connections to servers in the data center and tend to perform very poorly when trying to speak to remote servers. Bandwidth limitations prevent the software from transferring large amounts of data, and latency creates delays that cause chatty backup apps to make a lot of roundtrips between the backup server and client.
Another challenge is that the computers being backed up can't be counted on to be powered on at all times the way servers are in a data center. Most laptop users (and users of other types of remote devices) power down their devices or put them to sleep when they're not in use. Less obvious, perhaps, is that users in remote data centers often do the same thing with their servers and desktop PCs. Not a monumental issue, but one that must be addressed.
The next challenge is at the other end of the spectrum: some users leave their computers on -- and apps open -- 24 hours a day. So any viable remote backup system must address the issue of open (and possibly changing) files.
Finally, there's the requirement for bare-metal recovery. In the corporate data center, there are plenty of alternatives when a piece of hardware fails, such as a quick swap of an already-imaged drive. The best alternate a remote user may have is a WAN connection with a decent download speed and the hope that someone from corporate IT is available. If your remote servers or laptops have on-site service, the vendor can replace the hard drive or any other broken components. But then you'll need some type of automatic recovery that requires only the most basic steps (e.g., inserting a CD and rebooting).
Remote and mobile backup solved
The typical way the remote bandwidth challenge is solved today is by using a block-level incremental-forever backup technology. The key to backing up over slow links is to never again transfer data that has already been transferred. Full backups are no more and even traditional incremental backups transfer too much data. You must back up only new, unique blocks.
Latency is a separate issue. Just because a product does block-level incremental backups doesn't mean it was designed for use as a remote application. You need to ensure that the backup software understands it's communicating over a remote connection and avoids "roundtrips" whenever possible. Even if you have a remote connection with enough bandwidth, the latency of the connection can severely hamper your backup performance if your backup software isn't prepared for it.
Devices getting more mobile
iPad users come in two varieties: those who use the iPad to view data and those who use it to create or modify data. You don’t have to worry about those in the first category. But the second group—those who are actually creating or altering information on the go—needs to be instructed on how to back up their devices. The easiest way to do this is to make sure users sync their iPad with their laptop or desktop PC and then ensure that device gets backed up. It’s not a perfect solution, but it’s probably the best we have right now given the architecture of the iPad. The main challenge is that each application is given its own file space. Even if there’s an application that can back up data remotely over the Internet, it wouldn’t necessarily have access to the file spaces where data is being created or modified.
Dedupe does it all
The technology that most people have adopted to solve many of these problems is data deduplication, which significantly reduces the number of bytes that must be transferred. A dedupe system that's aware of multiple locations will only back up bytes that are new to the entire system, not just bytes that are new to a particular remote or mobile location. So if a file has already been backed up from one laptop and the same file resides on another laptop, the second instance of the file won't be backed up.
There are two basic types of deduplication: target deduplication (appliance) and source deduplication (software). Target deduplication appliances are designed to replace the tape of standard disk drives in your existing backup system so your backup software sends backup data to the appliance that dedupes the backups and stores only the new, unique blocks. Using a dedupe appliance has an added benefit, as switching from tape to disk as your initial backup target will likely increase the reliability of remote site backups.
To use target deduplication, you'll have to install an appliance at each remote site and direct backups to the appliance. After the appliance dedupes the remote site's backup, it can be replicated back to a central site. Because it requires an appliance of some sort, target deduplication isn't appropriate for mobile data.
Source deduplication is backup software that dedupes the data at the very beginning of the backup process. The server or mobile device being backed up communicates with the source deduplication server and "describes" the segments of data it has found that need to be backed up. If the source deduplication server sees that a segment has already been backed up, the segment isn't transferred across the network. This saves disk space on the server and reduces the amount of bandwidth the backup process uses.
Source deduplication can be used to back up both remote sites and mobile users. All you need to do is install source deduplication software on the computer to be backed up and initiate the backup. (This is a bit of an oversimplification, of course, and ignores the challenge of completing the initial full backup.)
Continuous backup of remote data
Another technology that should be considered for remote site and mobile user backup is continuous data protection (CDP). Think of CDP as replication with a "back" button. Like replication, it's a continuous process that runs throughout the day, incrementally transferring new blocks to a remote backup server. But unlike standard replication products, CDP systems also store a log of changes so that a protected system can be restored to any point in time within its retention period in a few seconds or less. While a traditional backup system (including one using deduplication) can restore a client to the last time a backup ran, a CDP system can restore a client to only seconds ago, since the backup is continuously occurring. A CDP product can be used to back up both remote sites and mobile users because it's also a block-level incremental-forever technology.
Integrated data protection
Remote sites may have another option, using what's sometimes referred to as self-healing storage. This broad term refers to storage that has backup and recovery integrated as core features. Typically, it's used to describe storage arrays that use redirect-on-write snapshot technology to provide historical versions of blocks and files within the volume being protected. The snapshots are then replicated to another volume (typically located in an alternate location), providing both history and relocation of data without using traditional backup methodologies. To use one of these products to back up a remote site would, of course, require installing a storage array at each remote site that would replicate to another larger array in a central site.
What about the cloud?
A cloud backup service is simply another method of delivering one of the above options. Some cloud backup services use source dedupe, while others use CDP. And some services provide an on-site target appliance that then replicates to the cloud or acts as a target for the replicated backups from your deduplication appliance. Some self-healing storage arrays know how to replicate to the cloud as well.
The bare-metal recovery issue is one that can only be addressed with a backup software product or service that has the feature built into the product. Give careful consideration to the importance of this feature for your environment. And like everything else in IT, don't just believe what the vendors say; test the product or service to see if it does exactly what you need it to do.
You should also ask how a vendor's products handle backing up systems that aren't always turned on or connected to the WAN. While most products and services can accommodate these occurrences, the way they do it can significantly impact the user experience. Suppose, for example, that a laptop hadn't been connected to the Internet for a long time and when it finally did connect, the backup products started the long-overdue backup. That might seem like a good idea, but it may also consume all of the laptop's available resources. That could prompt a help desk call or cause a user to stop the backup process when it interferes with other work. Make sure you understand the load the backup application places on the system it's backing up under various conditions.
BIO: W. Curtis Preston is an independent consultant, writer and speaker. He is the webmaster at BackupCentral.com and the founder of Truth in IT Inc.