Tip

In-band vs. out-of-band deduplication

What you will learn from this tip: This tip compares and contrasts in-band and out-of-band deduplication and offers some pros and cons of each approach.

An important differentiator among deduplication products is whether they work in-band or out-of-band. That is, do they deduplicate the data as they're writing it to the

Requires Free Membership to View

array or VTL (in-band), or is deduplication a secondary process that may run asynchronously (out-of-band). There are advantages and disadvantages to each method.

Related information
Compression, deduplication and encryption: What's the difference? 

Shedding storage pounds with data reduction technology

Disk-based backup continues gaining steam 
The advantage to the in-band method is that it works with the data only one time. The drawback is that, depending on the implementation, it could slow down the incoming backup. The inline camp argues that while they'll probably slow down the backup somewhat, when they're done, they're done. The out-of-band camp still has important work to do: Store the data.

The out-of-band method has to write the original data, read it, identify its redundancies, and then write one or more pointers if it's redundant. The advantage to this is that you can apply more parallel processes (and processors) to the problem, whereas the in-band method can apply only one process per backup stream. The disadvantage is that the data is written and read more than once, and the multiple reads and writes could cause contention for disk. In addition, the out-of-band method requires slightly more disk than an in-band setup because an out-of-band system must have enough disk to hold the latest set of backups before they're deduplicated. The out-of-band camp counters that slowing down the original backup is unacceptable, and that they'll be able to deduplicate the data in time for tomorrow's backup.

You probably shouldn't dismiss a vendor simply because it uses in-band or out-of-band methods, but definitely test the different deduplication methods to determine how fast they work in your environment. Remember to test the product against many slower backups as well as a smaller number of backups where speed matters. Some systems perform well for single streams, but don't scale for many streams. Some work well only when you send them many streams, but don't perform well with a very fast single stream. Finally, test the deduplication product with enough data to see whether it will handle the amount of data you back up every day. If it doesn't get the deduplication job done every day in time for the next night's backup, you're going to be in trouble.

This piece originally appeared in Storage magazine. Check out the complete article: The skinny on data deduplication.

This was first published in February 2007

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.