Maksim Samasiuk - Fotolia

Q
Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

How does deduplication in cloud computing work and is it beneficial?

The deduplication process reduces the amount of data in a storage system, but dedupe in the cloud may be more valuable to the cloud provider than the customer.

Deduplication in cloud and other storage platforms is a process by which repeated or duplicate data is removed from a data stream to reduce the amount of physical data stored in an appliance or system.

In primary storage, deduplication helps to reduce the amount of physical space consumed by removing identical blocks of data and using metadata to associate the logical copies of data to the physical ones. In the public cloud, the deduplication capabilities of the storage platform aren't exposed to the user.

If the provider chooses to implement deduplication in cloud computing, then that benefit is retained for the cloud provider. This is because storage space is billed based on logical capacity used -- rather than the physical capacity -- and any reduction in savings is used by the service provider to offer a cheaper service or to reduce its costs.

But for anyone using cloud storage for backup, there's an issue. Copying multiple backup images to the cloud will consume large volumes of storage, much more than if a deduplicating platform, such as a disk system, was used as the storage target.

There are a number of solutions to the deduplication in cloud problem. Many backup software platforms will dedupe at the source, and hold only the deduplicated data on physical storage. The backup software owns and manages the metadata that does the logical-to-physical translation.

An alternative is to look for a storage gateway that can offer a storage interface and do the deduplication. In this instance, the administrator isn't dependent on the backup software, and data can be more easily imported into other platforms.

The most obvious issue is that whichever backup software is used will own the metadata, so, ideally, a storage deduplicating gateway is the better option. This ensures that the data in the backup environment is portable outside of the backup software, without having to rehydrate the data to move it to another platform.

Beyond deduplication in cloud, the process works well on groups of virtual machines, where the base operating system is similar or identical across multiple VMs.

In the backup world, deduplication is used to reduce the volume of physical data stored when doing repeated backups of the same data set, such as a VM. When only a small percentage (say 5% to 10%) of the actual data changes between backups, deduplication ensures that the physical space consumed is as optimal as possible. Backup systems can see deduplication rates of 20:1 and higher.

Next Steps

Tips on cutting cloud expenses include deduplication

Strengths and weaknesses of deduplication for backup

How deduplication and compression can benefit virtual servers

This was last published in April 2017

Dig Deeper on Data reduction and deduplication

PRO+

Content

Find more PRO+ content and other member only offers, here.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

How have you used deduplication in the cloud?
Cancel

-ADS BY GOOGLE

SearchSolidStateStorage

SearchCloudStorage

SearchDisasterRecovery

SearchStorage

SearchITChannel

Close