Deduplication software is technology that eliminates redundant information and replaces subsequent iterations of that data with a pointer to the original. A data reduction software application is installed on a dedicated server without making changes to the physical network. Data dedupe technology can also be included with hardware appliances or storage arrays.
Deduplication, or data dedupe, started as a backup tool to reduce the amount of data physically stored during enterprise backups. The practice spread to primary storage with the advent of solid-state drives (SSDs). Dedupe is considered an important part of SSD flash arrays because the technology helps to lower the price per gigabyte of storage.
Deduplication software is often independent of the hardware it runs on, although the software must be certified with underlying hardware platforms.
Backup administrators using deduplication software may need to install a software agent on their production servers before they can be backed up to allow the client to communicate with the server running the software. Adjustments may have to be made to the backup process to recreate backup schedules, alerts and configurations. Software products may also require ongoing maintenance and updates as the environment changes or new versions of the software become available.
Data deduplication software approaches
Deduplication can be performed in two ways:
- Inline deduplication performs the data reduction process before data is written to disk.
- Post-process deduplication writes the data and then reduces it.
Inline deduplication takes longer, while post-process dedupe requires more disk capacity to stage data before it is reduced. Primary deduplication usually occurs inline; however, there may be a performance drag if the array does not have enough CPU power.
Backup data can also be deduped at the target or source (host). Backup deduplication hardware appliances usually dedupe at the target, while a software option can perform deduplication at the source. Deduping at the source results in less data being sent across the network, but requires more processing power.