Remote offices have always been the weak link in the data backup chain. Tape cartridges have to be changed according to a rotation schedule, sent off site for archival storage, and occasionally tested to make sure the information is being backed up properly. Because remote sites seldom have storage professionals on hand, these jobs fall to non-storage people as a secondary or tertiary job.
One solution is to do away with tape storage entirely and replace it with data deduplication software. After stripping the files down to unique, new or modified data, the files can be transmitted over the Internet to the central office where the IT staff can handle the backups.
"It is an appealing solution because remote tape is a nightmare," said Peter Eicher, product marketing manager for FalconStor Software Inc., a maker of deduplication software.
Deduplication is automatic once it is set up. It doesn't require intervention at the remote office to make it work. The deduplication system accepts the data from the server, removes redundancies at the block or file level, depending on the system, and forwards it to the data center (or local backup disk) for storage.
If you're backing up to a central office over your network, you need to make sure that you have enough bandwidth to do the job. Although deduplication enormously reduces the amount of data, it still needs a healthy channel to handle the load. How big a load backup to a central office will put on the network varies with the amount of data you are transmitting, and has to be determined by looking at the actual size of the deduplicated files, but typically a DSL connection is the absolute minimum. But if you're backing up to a local disk, network bandwidth isn't a concern.
Software-based data deduplication
Deduplication can be done either by software (with CommVault, FalconStor, Symantec Corp., for example) or by hardware with an appliance such as those from Data Domain. The software can be built into a storage management or backup application (CommVault Simpana and Symantec Veritas NetBackup) or be a standalone product (FalconStor File-interface Deduplication System).
Generally, deduplication by software is cheaper, but it puts a bigger load on the server. Going through the data and combing out the duplicates takes a fair amount of computing power.
When is dedupe software not cheaper? When supporting it requires a major hardware upgrade on the server. Software is also more work for administrators since the software must be kept up to date and the changeover means modifying the backup software configuration, or in some cases, replacing the backup application completely if you choose to use an integrated product like CommVault Simpana or Veritas NetBackup.
Data deduplicating appliances
By contrast, deduplication appliances are typically plug and play. The appliance sits in the remote office and processes the data stream as it is sent from the server before sending it to disk or over the network to the central office.
According to Eicher, a typical deduplication reduces the data by 15:1 or 20:1, depending on the nature of the data and how you back it up. This makes it practical to transmit the data to be backed up over the network to the data center.
Once the data has been deduplicated, it can be encrypted before being transmitted. Eicher said that compressed or encrypted data doesn't deduplicate well because there's little redundancy left in it. The data should be deduplicated first and then compressed and/or encrypted.
Some remote offices don't bother to transfer the data to a central location. They dedupe it and store it locally in a disk-to-disk scheme. Eicher said that between 40% and 50% of remote sites don't bother to send the data off site. Here the advantage of deduplication is that you can store a lot more data in the same amount of disk space. This allows the remote office to keep more backups and hence a longer history. "Even with a small deduplication system, it's easy to keep weeks or even months of data," Eicher said.About this author: Rick Cook specializes in writing about issues related to storage and storage management.