Target-based data deduplication technology product considerations

Learn what's important to look for in target-based data deduplication products in this tip by data backup expert W. Curtis Preston.

There are three things to verify when considering a target-based data deduplication product: cost, capacity and throughput. When considering the cost of deduplication systems (or any system for that matter), remember to include both capital expenditures (CAPEX) and operational expenditures (OPEX). Look at what hardware and software you'll need to acquire to use a particular appliance to match a given throughput and capacity model.

Some dedupe vendors make it very easy to arrive at a CAPEX number: for example, you need to store 30 TB of data, and you back up 5 TB/day, so you need model x. It includes all the computing and storage capacity you need to meet your requirements. Other vendors just provide a gateway that you can connect to your own storage. Finally, some vendors provide just the software, leaving the purchase of all hardware up to you. Remember to include the cost of the server hardware in this configuration, making sure that you're specifying a server configuration that's approved by that vendor. In both the gateway- and software-only pricing models, make sure to include the cost of the disk in your comparison even if it's "free." The dedupe pricing world is so unique that there are scenarios where you can actually save money by not using disk you already have.

One final cost element: Remember to add in (if necessary) any "extra" disks, such as a "landing zone" (found in post-process systems), a "cache" where data is kept in its original format for faster restores or any disks not used to store deduplicated data. All of those disks should be considered in the total cost of purchasing the system.

Target and source data dedupe defined
Target deduplication: Data deduplication is done in an appliance that sits inline between the backup server and the backup target. The appliance receives the full backup stream and dedupes the data immediately.

Source deduplication: Backup software performs the deduplication on the backup client and the backup server before sending data to the backup target. This approach has less impact on the available bandwidth.

You then need to consider OPEX. As you're evaluating each vendor, make note of how you'll need to maintain their systems and how the systems will work with your backup software vendor. Is there a custom interface between the two (e.g., Veritas NetBackup's OST API), or will your system just pretend to be a tape library or a file system? How will that affect your OPEX? What's it like to replace disk drives, disk arrays or systems that are part of this system?

There are two ways to test capacity. The first is to send a significant amount of backups to the device and compare the size of those backups with the amount of storage they take up on the target system. This will show your dedupe ratio. Multiply that ratio times the disk capacity used to store deduped data and you'll get your effective capacity. The second method is to send backups to the device until it fills up and then record how many backups were sent. The latter method takes longer, but it's the only way to know how the system will perform long term. (The performance of some systems decreases as they near capacity.)

Performance considerations for dedupe

Finally, there are several things you should test for performance.

Ingest/Write. The first measure of a disk system (dedupe or not) is its ability to ingest (i.e., write) backups. (While restore performance is technically more important, you can't restore what you didn't back up.) Remember to test both aggregate and single-stream backup performance.

Restore/Copy/Read speed. The second measure of a disk system (dedupe or not) is its ability to restore or copy (i.e., read) backups. I like to point out that the whole reason we started doing disk-to-disk-to-tape (D2D2T) backups was to use disk as a buffer to tape; therefore, if a disk system (dedupe or not) isn't able to stream a modern tape drive when copying backups to tape, then it misses the point. Remember to test the tape copy where you plan to do the tape copy; for example, if you plan to replicate to another system and make the tape there, test that. Finally, don't assume that restore speeds will be fine, and remember to test both single-stream and aggregate restore performance.

This article originally appeared in Storage magazine.

W. Curtis Preston, executive editor and independent data backup expert
W. Curtis Preston (a.k.a. "Mr. Backup"), executive editor and independent backup expert, has been singularly focused on data backup and recovery for more than 15 years. From starting as a backup admin at a $35 billion dollar credit card company to being one of the most sought-after consultants, writers and speakers in this space, it's hard to find someone more focused on recovering lost data. He is the webmaster of BackupCentral.com, the author of hundreds of articles, and the books "Backup and Recovery" and "Using SANs and NAS."

This was first published in June 2009

Dig deeper on Data reduction and deduplication

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchSolidStateStorage

SearchVirtualStorage

SearchCloudStorage

SearchDisasterRecovery

SearchStorage

SearchITChannel

Close