Problem solve Get help with specific problems with your technologies, process and projects.

Shedding storage pounds with data reduction technology

This tip discusses various approaches to "shed storage pounds," backup less data and make recovery easier.

What you will learn from this tip: The holiday season has now officially started, and pretty soon, we will hear the usual New Year's resolutions about shedding extra pounds and taking full advantage of that gym membership. In keeping with the spirit, this tip discusses various approaches to "shed storage pounds," backup less data and make recovery easier.

Unless you are brand new to the IT field, or have not read anything about data storage in the past five or six years, you know that data growth for most organizations has been steadily in excess of 50% per year for at least as long. This has been the source of countless headaches for storage administrators when it comes to provisioning. With lower cost storage somewhat easing the pain, the spotlight is now on backups. Traditional backup methods can no longer handle the constantly increasing amount of data and decreasing recovery time objectives (RTO). It is clearly becoming an issue of what to backup as much as how.

Data reduction information
Users discuss data deduplication doubts 

Data Domain pushes on with deduplication 

Compression, deduplication and encryption: What's the difference?

Businesses are now realizing they cannot afford to let data grow forever and must start addressing the issue at the source. What follows are a number of options to consider when attempting to reduce the amount of data to backup, or the amount of space used once backed up.


Information or data lifecycle management (ILM) is really where it all begins. Although many vendors were quick to associate the acronym with hardware/software solutions, ILM is really about corporate decisions, policies and regulations regarding where data is stored, and how long it should be kept; technology only assists in automating those policies. Any data deleted or archived as a result means less data to back up (and restore).

Data categorization

Understanding what data an organization stores is arguably the first step to ILM. It also helps identify how much data there is, where it is stored as well as how and if it should be backed up. However, this is not a trivial task and requires both time and resources.

Because technology is sometimes cheaper than lawyers and can be implemented faster than policies, listed below are some other options.

Backup/Data retention

How long backup data is kept will have a significant impact on the backup environment. Organizations must seriously question how useful a 30-day-old database is or whether they need the ability to restore email messages that are 30 or 60 days old -- remember distinction between the archive and backup. Backups protect from data loss and should not be used for long-term retention. In addition, archived data no longer needs to be backed up daily or weekly, nor does it need to be restored after a system failure.

Archives for email, file server, database

Symantec Enterprise Vault, Zantaz EAS, EMC EmailXtender and Centera, CommVault DataMigrator, Princeton Softech Optim, OpenText Livelink ECM, AXS-One AXS-Link, IBM Tivoli Archive Manager are just a few of the products that allow organizations to archive application data and manage retention. Once again, data archived at the source is no longer backed up daily, thus reducing backup storage utilization.

HSM capable solutions, such as IBM TSM, Storage Migrator and DiskXtender to name only a few, can also help reduce the backup and restore pains by migrating less frequently used data off the primary storage and leaving a "stub file" in place pointing to the actual file. Backup products integrated with these solutions will only backup or restore the stubs, thus minimizing the amount of data backed up.

Data replication

RTO and recovery point objectives (RPO) dictate the difference between recoverability and availability. Data subject to zero downtime and zero loss requirements should be made highly available via replication, rather than simply backed up. For such data, many IT organizations have switched to multiple point-in-time copies on disk for primary data protection and only use traditional backups for added peace of mind.

Single instance storage

Single instance storage or data deduplication solutions are probably one of the most refreshing advancement in the backup storage arena. Products, such as DataDomain, Avamar, DoubleTake and NearStore, can be implemented as disk arrays or virtual tape libraries (VTL) and in some cases, can be fully integrated with an existing backup solution. These products can dramatically reduce the amount of storage required for backup data by only storing one instance of duplicate files. In some instances, granularity can be such that even duplicate data sequences will not be stored.

Well implemented, these processes, policies and tools will result in much leaner storage and backup environments in the new year.

Happy holidays.

Dig Deeper on Data reduction and deduplication

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.