There's been a lot more focus (and vendor marketing) on the data deduplication process for backup -- specifically, when, where, how and to what degree deduplication impacts the process of writing data. However, that focus isn't accompanied by increased enlightenment around how deduplication affects the recovery process -- specifically, how quickly you can recall data for restoration.
During the recovery process, the requested data may not reside in contiguous blocks on disk -- even in non-deduplicated backup. As backup data is expired and storage space is freed, fragmentation can occur, which may increase recovery time. The same concept applies to deduplicated data as unique data -- and pointers to the unique data -- may be stored non-sequentially, slowing down recovery performance.
Some backup and storage systems vendors that offer deduplication features anticipated these recovery performance issues and optimized their products to mask the disk fragmentation problem. Some vendors' solutions, such as ExaGrid Systems Inc. and Sepaton Inc., may keep a copy of the most recent backup in its whole form, enabling more rapid restore of the most recently protected data, vs. other solutions that have to reconstitute data based on days, weeks or months of pointers. Other solutions are architected to distribute the data deduplication workload during backup and reassembly activity during recovery across multiple deduplication engines to speed processing. This is the case with both software- and hardware-based approaches. Vendors that spread deduplication activities across multiple nodes and, importantly, allow additional nodes to be added, may provide better performance scalability over those that have a single ingest/processing point.
Performance is dependent on several factors, including the backup software, network bandwitdth, disk type and more. The time it takes for a single file restore will differ greatly than a full restore. It will, therefore, be important to test how a deduplication engine performs in several recovery scenarios, especially for data stored over a longer period of time, to judge the potential impact of deduplication in your environment.
About this author: Lauren Whitehouse is an analyst with Enterprise Strategy Group and covers data protection technologies. Lauren is a 20-plus-year veteran in the software industry, formerly serving in marketing and software development roles.
Do you have comments on this tip? Let us know.
Please let others know how useful this tip was via the rating scale below. Do you know a helpful backup tip, timesaver or workaround? Email the editors if you'd like to write tips for SearchDataBackup.com.
This was first published in August 2008