Deduplication allows users to reduce the amount of backup data that needs to be stored. However, when backing up very large data sets, it does not necessarily solve backup window issues. In this interview, Marc Staimer, founder of Beaverton, Ore.-based Dragon Slayer Consulting, discusses backup deduplication today.
Many of our readers report that they continue to struggle with large backup data sets. How much adoption have you seen of dedupe?
Marc Staimer: Heavily adopted, especially at the enterprise level. Dedupe is built into most backup software products today. It is rare when there is not dedupe.
So, are users creating so much data that dedupe isn't fully able to help make for a more efficient backup process?
Staimer: Dedupe is not magic. It solves a backup software problem. The problem is that the backup software creates a lot of copies of backed-up data. Dedupe solves that problem. It does not solve backup window problems. If the application-data change rate is high between backup windows, then organizations may still struggle to back up their data within that time frame. This is most evident in databases such as OLTP and OLAP.
Of people using dedupe today, what's the sort of split on hardware vs. software dedupe?
Staimer: It's all software dedupe. Dedupe for backup is either in the source, the media server or the target storage. Regardless of where it takes place, it is software. You are attempting to break down the split between backup software dedupe and target dedupe storage appliances. This split is obfuscated by the fact that target dedupe storage appliances such as EMC Data Domain now utilize source dedupe software (DDBoost) to augment its ability to dedupe faster. Symantec NBU 5000s use OST, and HP StoreOnce uses Catalyst. Most backup software is a mix of source and media server dedupe.
Also, many of today's purpose-built backup appliances bundle the backup software, media server hardware, dedupe and target storage [in an] all-in-one converged platform. Is that backup software or target dedupe storage? The answer is yes.
What are the most significant developments in the dedupe space in the past year or so?
Staimer: As I mentioned before, the problem dedupe solves was caused by backup software. Dedupe solved a storage consumption issue. It really does not solve a backup window issue. Some proponents of source-level dedupe claim that those products back up less than other forms of dedupe and therefore better meet backup windows. Testing makes that highly debatable. The issue of backing up large data sets is not addressed by dedupe. Compression can help to a degree. In the Oracle database world, Hybrid Columnar Compression is very helpful in backing up very large Oracle data sets, but it only works with Oracle Storage.
But the question of how to back up large amounts of data is a nontrivial problem. There are methodologies. Most of them today are storage system-related. In the end, there is no magic bullet. Deduplication is most definitely not it.