Inline deduplication vs. post-processing: Data dedupe best practices
By Dave Raffo, Senior News Director
Much of the early debate about data deduplication focused on inline deduplication
vs. post-processing deduplication. Inline deduplication reduces data while it is being sent to the backup device while post-process backs up data first, and then reduces it. Both methods have advantages and disadvantages. Post-processing backs up data faster and reduces the backup window, but requires more disk because backup data is temporarily stored to speed the process.
Inline deduplication products include EMC Corp.'s Data Domain and Avamar, IBM Corp. ProtecTier, Symantec Corp. PureDisk, CommVault Simpana and NEC HydraStor. Post-process products include ExaGrid EX, FalconStor FDS, and Sepaton DeltaStor. Quantum Corp.'s DXi platform gives customers the choice of post-process or inline dedupe.
FalconStor Software and Sepaton Inc. call their methods concurrent processing because while they move data to a disk staging area first; they don't wait for backups to finish before deduping.
Both inline and post-process methods have their advocates but experts say neither is universally better; it all depends on what type of backup environment you have.
Deduplication is often combined with replication for disaster recovery. Deduplication reduces the amount of data and lowers the bandwidth requirement to copy data offsite. EMC Data Domain, Quantum, IBM ProtecTier, FalconStor and Sepaton are among the vendors who have beefed up their replication capabilities over the past year, often increasing the number of remote sites that can fan into the data center.
This was first published in March 2010