Complete guide to backup deduplication
A comprehensive collection of articles, videos and more, hand-picked by our editors
Now that deduplication has become a mainstream technology, IT pros are forced to choose between competing deduplication solutions. One question that I have been asked several times lately is whether it is better to use Windows Server 2012's native file system deduplication or to leave the deduplication process to your backup software.
As with so many other things in IT, there is not a clear-cut answer that holds true 100% of the time. There are, however, a number of factors that can help you determine which deduplication solution is better for your organization.
What are your deduplication goals?
The first consideration that needs to be taken into account is your deduplication goals. Backup software products that perform source deduplication do so with the goal of decreasing the volume of data that needs to be transmitted to the backup server. This type of deduplication typically does not decrease the server's use of primary storage, but rather decreases the volume of data that is transmitted to and stored on the backup.
Windows Server 2012's native deduplication is also source-based, but its primary goal is to decrease space consumption on the server's primary storage. This type of deduplication does not do anything to reduce the volume of data that is being backed up (at least not by itself).
Of course, this raises the question of whether it is possible to use Windows Server 2012's native file system deduplication in conjunction with backup software in an effort to get deduplicated primary storage and a deduplicated backup. Generally speaking, the answer is yes. You just have to make sure that the backup software is fully compatible with Windows Server 2012. In some cases, backup software might even be able to take advantage of the fact that Windows has already deduplicated data so that the backup software does not have to perform its own source-side deduplication. In order to do so, the backup agent would have to be Windows-deduplication-aware, which means that it would have to perform a block-level backup. It would also have to know to back up changed data and the Windows changed chunk store container.
Another major consideration is scalability. Enterprise-class backup software is usually designed with the assumption that vast amounts of data will need to be deduplicated. Conversely, Windows Server 2012's native file system deduplication is better suited to small and medium-sized organizations.
A Windows Server 2012 deduplication job requires 1 CPU core and about 250 MB of memory. Given these resources, Windows is able to deduplicate a single volume at a time and can process roughly about 100 GB of data per hour, or 2 terabytes (TB) per day.
Needless to say, this capacity will likely prove to be too small for larger organizations. Windows servers that contain additional CPU and memory resources can deduplicate multiple volumes simultaneously, but running parallel deduplication jobs does nothing to increase the per-volume deduplication throughput.
What type of data is being deduplicated?
The type of data that is being deduplicated plays a major factor in whether it will be better to use native deduplication or backup software deduplication. Regardless of which approach you use, some data will inevitably deduplicate better than other data. Unique data, of course, cannot be deduplicated.
Source-side deduplication that is baked into a backup agent does not typically place restrictions on the types of data that you can and cannot deduplicate. The reason for this is that backup software does not usually alter the original data. Instead, it focuses on removing redundancy before the data is sent to the backup server.
Windows Server 2012's deduplication does alter the original data. That being the case, there are some types of data that are poor candidates for deduplication. Specifically, Microsoft recommends that you do not deduplicate Hyper-V hosts, Exchange Servers, SQL Servers, WSUS servers or volumes containing files that are 1 TB in size or larger. The essence of this recommendation is that volumes containing large amounts of locked data (such as a database) or rapidly changing data tend not to be good candidates for deduplication.
In some cases, using Windows 2012's native deduplication feature might not even be an option. If you have servers that are running older versions of Windows, you will have to rely on your backup software to perform deduplication. Even if your servers are running Windows Server 2012, Windows is not capable of deduplicating boot volumes or system volumes. Many servers contain only a single volume, which means that you probably won't be able to use Windows' native deduplication capabilities for some servers, even if they are running Windows Server 2012.
So, is it better to use Windows Server 2012's native file system deduplication, or should you use your backup software's deduplication capabilities instead? If your main goal is to optimize the backup process, then you will usually be better off using the backup software's deduplication feature. The backup agent's deduplication engine is dedicated to optimizing the backup process, whereas the native Windows deduplication engine is more concerned with storage optimization. However, in some cases, they can be used together.
About the author
Brien M. Posey, MCSE, has received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server. Brien has served as CIO for a nationwide chain of hospitals and has been responsible for the department of information management at Fort Knox. You can visit Brien's personal website at www.brienposey.com.