Much of the early debate about data deduplication focused on inline deduplication vs. post-processing deduplication. Inline deduplication reduces data while it is being sent to the backup device while post-process backs up data first, and then reduces it. Both methods have advantages and disadvantages. Post-processing backs up data faster and reduces the backup window, but requires more disk because backup data is temporarily stored to speed the process.
Inline deduplication products include EMC Corp.'s Data Domain and Avamar, IBM Corp. ProtecTier, Symantec Corp. PureDisk, CommVault Simpana and NEC HydraStor. Post-process products include ExaGrid EX, FalconStor FDS, and Sepaton DeltaStor. Quantum Corp.'s DXi platform gives customers the choice of post-process or inline dedupe.
FalconStor Software and Sepaton Inc. call their methods concurrent processing because while they move data to a disk staging area first; they don't wait for backups to finish before deduping.
Data deduplication technology tutorial
Data deduplication technology tutorial: A guide to data deduping and backup in the enterprise
Global data deduplication and backup: A primer
Choosing data deduplication product: Hardware and software offerings
Data deduplication best practices: Inline vs. post-processing dedupe
Both inline and post-process methods have their advocates but experts say neither is universally better; it all depends on what type of backup environment you have.
Deduplication is often combined with replication for disaster recovery. Deduplication reduces the amount of data and lowers the bandwidth requirement to copy data offsite. EMC Data Domain, Quantum, IBM ProtecTier, FalconStor and Sepaton are among the vendors who have beefed up their replication capabilities over the past year, often increasing the number of remote sites that can fan into the data center.