Pros and cons of different deduping methods

In this Storage Decisions video, Marc Staimer outlines three major types of deduplication -- file, block or blocklet, and content-aware -- and he describes the pros and cons of each method.

"Storage-based, file-based dedupe reduces duplicate files and reduces primary storage consumption. It does it on a file basis. So if you have duplicate files, it will reduce the exact duplicate file. Typically, it's free. You can get it with NetApp; you can get it with EMC; you can get it with a variety of players. Realistically, they're giving it away for free because there are downsides to this technology," said Staimer, president of Dragon Slayer Consulting.

Staimer said the strengths for this form of deduping include its effectiveness in handling duplicate email attachments, duplicate ISO files and golden images. He said it offers roughly anywhere from a two-to-one to three-to-one reduction of primary data.

"That's about the best you're going to see on primary data. Secondary data, you see much better reductions. So you just need to be aware of that from that perspective," Staimer said.

He said read/write latencies with this form of deduping take longer than others, which means it is frequently performed post-process, not inline, especially with primary data deduplication.

Another form of deduping is able to reduce storage consumption by not looking at the file layer, but at individual blocks and blocklets, the latter being smaller than 512 bytes in size. This form of dedupe is a very fine granular approach that provides "excellent" deduplication and is effective with backup data. But storage that includes this dedupe method comes at a cost, Staimer noted.

"Be aware that storage that has this built in tends to carry a premium" he said. "You're going to pay more for it than other storage."

A third method -- content-aware storage deduping -- is available through Dell, he said.

"It takes the data … and looks for common storage pieces among the different files. And then it recompresses it in its new format, because you have to have a reader piece of software to read the data after they compress. So it does really well with different file types," said Staimer, who noted that the method requires special reader software to view data and that the deduping process must be scheduled for after normal operating hours for an organization.

View All Videos

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.