What you will learn: This tip details the factors that influence the data deduplication ratio (the ratio of data before deduplication to the amount of data after deduplication), so you can estimate the deduplication ratio you can reasonably expect to achieve.
Every data deduplication vendor claims that their product offers a certain ratio of data reduction. However, the actual data deduplication ratio can vary according to many factors, some of which are within a user's control. Below are a few variables.
Redundant data
The more redundant data you have on your servers, the higher the data deduplication ratios you can expect to achieve. If you have primarily Windows servers with similar files and/or databases, you can reasonably expect to achieve higher ratios of data deduplication. If your servers run multiple operating systems and different files and databases, expect lower data deduplication ratios.
Rate of data change
Data deduplication ratios are related to the number of changes occurring to the data. Each percentage increase in data change drops the ratio; the commonly cited 20:1 ratio is based on average data change rates of approximately 5%.
Precompressed data
Requires Free Membership to View
When you register for SearchDataBackup.com, you’ll also receive targeted emails from my team of award-winning editorial writers. Because your job never seems to get any easier, it’s our goal to keep you up-to-date on the latest backup tips, trends and technologies that will help you get the job done.
Rich Castagna, Editorial DirectorData retention period
The length of time data is retained affects the data-reduction rate. For example, to achieve a data-reduction ratio of 10 times to 30 times, you may need to retain and deduplicate a single data set over a period of 20 weeks. If you don't have the capacity to store data for that long, the data-reduction rate will be lower.
Frequency of full backups
Full backups give data deduplication software a more comprehensive and granular view into the backup. The more frequently full backups occur, the higher the level of data deduplication you'll achieve. Deduplicating backup software products have a slight edge over disk libraries because they run a full server scan every time they execute a server backup, even though they only back up changes to existing files or new files. In between full backups, disk libraries usually only receive the changes sent as part of the backup software's daily incrementals or differentials.
Check out the complete text of the Storage magazine article, Catching up with deduplication.
Jerome M. Wendt is a storage analyst specializing in open-systems storage and SANs.
This was first published in June 2007
Join the conversationComment
Share
Comments
Results
Contribute to the conversation