This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
2. - How deduplication is performed: Read more in this section
- The benefits of deduplication and where you should dedupe your data
- Post-process vs. inline deduplication and more
- Software versus hardware backup deduplication
Explore other sections in this guide:
- 1. - Backup deduplication technology today
- 3. - How deduplication is used today
- 4. - Data deduplication challenges
This is the third part of a nine-part series on deduplication. For easy access to all nine parts, check out our quick overview of Deduplication 2013.
Sometimes, choosing between a hardware- and a software-based deduplication system can be tough, and the vendors don't make the choice any easier. The vendors are obviously biased. If you ask hardware vendors if you should use a hardware- or a software-based deduplication solution, they will tell you that a hardware-based solution is the best. Likewise, a software vendor will probably tell you that a software-based solution is the way to go. So how do you decide which type of solution to use?
The best approach to this problem is usually to consider your own unique backup infrastructure and to consider what is really important to you. Hardware- and software-based solutions both have their strengths and weaknesses. Part 3 of our series on deduplication in 2013 compares hardware and software deduplication and discusses using them together.
Advantages of hardware deduplication
Perhaps the biggest advantage to hardware-based deduplication systems is that there is nothing extra to buy. If you purchase a virtual tape library or a NAS appliance, then the hardware will likely contain everything you need for deduplication.
Another major advantage to hardware-based deduplication is that you will generally receive better compression ratios from a hardware appliance than from a software-based solution. The reason for this is simple. Software-based deduplication products exert a workload on either the source server or your backup target server. The more aggressive the compression algorithm, the more of a load is exerted on the server. That being the case, software vendors tend to try to strike a balance between deduplication efficiency and performance. This isn't as big an issue with a hardware-based appliance because hardware appliances handle the deduplication process internally without exerting an additional load on your production servers.
With the exception of lower-end NAS devices, hardware-based deduplication solutions are generally most suitable for large organizations. Not only do devices such as virtual tape libraries tend to be cost-prohibitive for smaller organizations, but such devices are also generally equipped with enterprise-class features that provide scalability and reliability.
Benefits to software deduplication
One of the main benefits to using a software-based deduplication solution is price. Hardware-based appliances often come with a high price tag, because you are purchasing physical hardware. While there is no denying that some software-based solutions can be expensive, they tend to be cheaper than hardware-based solutions, and there are lower-end software packages that fit the budgetary needs of even the smallest organizations.
Check out the rest of our series on deduplication
Get the inside scoop on hardware-based deduplication
Read about software-based dedupe and cloud backup
Learn more about target deduplication
Explore more about the deduplication tax and why it's important
In some ways, software-based solutions can be easier to implement than hardware-based solutions. When you install a hardware appliance, you must connect the appliance to your physical network. This may mean making architectural changes in order to accommodate the new appliance.
Although software-based deduplication solutions do not require you to make changes to your physical network, software products sometimes require a tedious initial configuration process. Furthermore, you may find that you have to install an agent onto your production servers before you can duplicate them or back them up. There is also the issue of ongoing maintenance (such as patch management) for software products.
One advantage that software-based solutions sometimes have over hardware appliances is that they may make more efficient use of network bandwidth. Hardware appliances perform deduplication on the backup target (the appliance itself). Some software products perform source-side deduplication. If deduplication is performed at the source, then less data will need to be sent across the network as a part of the backup process, because the data will have been deduplicated prior to transmission.
Using hardware and software dedupe together
It is easy to spend a lot of time debating whether it is better to deduplicate data through the use of backup software or to perform deduplication through a hardware appliance. As the previous section explained, there are distinct advantages to each approach. In some situations, however, it might be better to take a hybrid approach to deduplication by using both hardware and software.
Taking a multi-faceted approach to deduplication is not appropriate for every organization. Even so, hardware appliances and software applications both have their strengths and weaknesses, and using both hardware and software may be the only way to create an optimal deduplication solution.
Suppose, for instance, that an organization has a main office and several small branch offices, all of which need to be backed up. Because hardware appliances such as virtual tape libraries are often designed for large-scale environments, it might be overkill (especially in terms of cost) to place such an appliance in each of the branch offices. Instead, organizations might be better off backing up the branch office servers across the WAN.
When it comes to backing up servers across the WAN, backup software that performs source deduplication is typically the best choice, because the data can be deduplicated before it has to traverse the WAN, thereby decreasing the bandwidth consumption of the backup process.
In the main office, it probably makes more sense to perform target deduplication on a backup appliance than to perform source deduplication on production servers, because doing so will allow the organization to avoid burdening the production servers with extra disk I/O and CPU cycles.
Of course, this is only one example of a situation in which it might be advantageous to perform deduplication on two different levels. Every organization's situation is different. As such, the best approach to deduplication planning is to carefully consider the organization's needs and then match those needs to the most appropriate deduplication method or combination of methods.
Part 4 of this series explores target deduplication.
Brien M. Posey, MCSE, has received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server (IIS). Brien has served as CIO for a nationwide chain of hospitals and has been responsible for the department of information management at Fort Knox. You can visit Brien's personal website at www.brienposey.com.