McCarony - Fotolia
Data deduplication is a storage capacity optimization technique that identifies repeated sets of data in a data stream and eliminates them, retaining a single copy on physical media. Metadata and pointers are used to track each logical data instance that maps to the physical copy. Data deduplication techniques were established in the backup space so multiple, repeated full backups of a server or virtual machine could be heavily deduplicated because they contained either the same unchanged data or were based on a single master image.
Data deduplication techniques have figured heavily in products that resolve sprawl issues, such as copy data management (CDM) platforms. These products offer the ability to use data for purposes other than data protection, such as creating test/development copies of production data. Customers save with the re-use of static data, now typically stored on spinning media in powered up deduplication appliances. CDM vendors have built in technology that enables backups to be delivered effectively and that works for secondary requirements, including managing thousands of application snapshots.
Another innovation in data deduplication techniques is "pre-duplication," where the client is able to deduplicate data before sending it across the network. While this concept was seen almost 10 years ago at PureDisk, a company acquired by Symantec, vendors today are integrating it into their platforms. Hewlett Packard Enterprise did it with its 3PAR array, looking to improve data deduplication techniques using snapshots and to reduce the amount of data transiting the network. The deduplication process is also being distributed, making it possible to scale out backups without the bottleneck of managing the hash values in a single process.
Backup vendors are now also looking at the data itself and building in intelligent deduplication based on the application content rather than basing it on simple block-level identification. File-level dedupe can identify files that can be single instanced, including file attachments on backups of email systems. Again, these processes are making the backup client more intelligent, reducing the workload on the network and the back-end deduplication engine.
Everything you need to know about backup deduplication
Recovery in place as a data backup strategy
Strengthen flash performance with data reduction techniques
Dig Deeper on Data reduction and deduplication
Related Q&A from Chris Evans
Agentless data backups offer some major advantages over agent-based backups. The technology should be used wherever possible, and it can be ... Continue Reading
Using Oracle Recovery Manager for database backup and restore? Explore the Oracle backup script and command process, with options for specific ... Continue Reading
While ransomware remains a top threat, it is not the only cybersecurity problem data backup admins need to keep on their radar. Here are three more ... Continue Reading