Do you think data should be deduplicated while the backup is occurring or after it is complete?
There are two approaches to this. There's the post-processing architecture that accepts all the data incoming and then stores the data on disk. Then, there is the more common in-line architecture.
Personally, I'm more in the post-processing camp right now. I'm trying to fully understand the benefits of doing it in-line. My concern at the enterprise level is that these products can do consistent restores, backups and offloads to tape. And, that the product can do these things while maintaining performance.
From a tactical perspective, I like the post-processing approach. As long as you keep buying disk, you can keep doing backups. It may not be as elegant or nicely designed as the in-line approach, but you can always do the backup and the data can be deduplicated later.
However, my opinion is in a state of flux in this area.
Check out the entire Data Deduplication FAQ.
Dig Deeper
-
People who read this also read...
This was first published in December 2007