"Storage-based, file-based dedupe reduces duplicate files and reduces primary storage consumption. It does it on a file basis. So if you have duplicate files, it will reduce the exact duplicate file. Typically, it's free. You can get it with NetApp; you can get it with EMC; you can get it with a variety of players. Realistically, they're giving it away for free because there are downsides to this technology," said Staimer, president of Dragon Slayer Consulting.
Staimer said the strengths for this form of deduping include its effectiveness in handling duplicate email attachments, duplicate ISO files and golden images. He said it offers roughly anywhere from a two-to-one to three-to-one reduction of primary data.
"That's about the best you're going to see on primary data. Secondary data, you see much better reductions. So you just need to be aware of that from that perspective," Staimer said.
Another form of deduping is able to reduce storage consumption by not looking at the file layer, but at individual blocks and blocklets, the latter being smaller than 512 bytes in size. This form of dedupe is a very fine granular approach that provides "excellent" deduplication and is effective with backup data. But storage that includes this dedupe method comes at a cost, Staimer noted.
"Be aware that storage that has this built in tends to carry a premium" he said. "You're going to pay more for it than other storage."
A third method -- content-aware storage deduping -- is available through Dell, he said.
"It takes the data … and looks for common storage pieces among the different files. And then it recompresses it in its new format, because you have to have a reader piece of software to read the data after they compress. So it does really well with different file types," said Staimer, who noted that the method requires special reader software to view data and that the deduping process must be scheduled for after normal operating hours for an organization.