In the late 1970s, compression algorithms were developed in an effort to address the increasing demand to store text files to disk; data compression tools such as Lempel Ziv (LZW) represent some of the early data reduction efforts. In the early 2000s, data deduplication emerged as the newest data reduction technology and became widely adopted within a few years. However, both technologies have limitations in terms of performance and capabilities depending on the type of data targeted.
Next-generation data compression
There have been some major improvements with data compression technology, and companies such as Ocarina Networks and Storwize Inc. have found a way around the traditional performance limitations of CPU intensive software by moving compression to an appliance that sits between host and primary disk. As an added bonus, compressed data can be deduplicated once it moves from primary disk to backup or archive storage media. However, this technology is still relatively new and currently limited to network-attached storage (NAS)-based storage. Future releases will eventually address Fibre Channel (FC) and iSCSI, but will also likely need to become compatible with emerging technologies like the network convergence Fibre Channel over Ethernet (FCoE).
Other data reduction options
Unfortunately, outside of deduplication and compression, technology options remain somewhat limited in support of data reduction and, in some instances, do not necessarily reduce the amount of data stored. Another data reduction option that remains is data deletion or disposition, which can also be supported by technology but requires a very human component known as a "policy." But let's first look at the other technology options for data reduction before getting into the disposition aspect.
SIS is a technology that looks for identical files in a particular data storage environment and when found, replaces all extra copies with pointers to a single file (single instance) that can be shared. A good example of this technology is a capability within Microsoft Exchange where an email attachment sent to 30 recipients is stored only once and referenced in multiple inboxes. This is transparent to end users as the attachment appears to be in each individual inbox; meanwhile, the resulting data reduction ratio for the attachment in our example is 30 to one. This data reduction method is effective mostly in data storage environments where users share a lot of identical data.
Data archiving is often promoted as a data reduction option, but in reality, it often only migrates or moves data. Archiving tools can reduce the amount of data needed to be managed or backed up on a daily basis by moving infrequently used or no longer needed data to a different storage media or location. However, while archives reduce the amount of production data, it does not necessarily reduce the overall amount of data. This is because the data that is moved to tape or other storage media does not automatically constitute data reduction. However, when data archiving is combined with technologies such as SIS, data deduplication and compression, then data reduction really starts taking place.
Data disposal or deletion
Data deletion is the only other viable data reduction option for environments when deduplication, compression and SIS do not meet the requirements. However, data deletion is by far the least popular option among storage professional and business managers. Data deletion is unpopular because of all the legal implications relating to regulatory compliance, freedom of information, e-discovery, etc. Before going on a data deletion rampage, there are a few things to consider:
- Define a clear policy on what type of data can be stored on corporate servers. File servers are often used to store user data and many companies do not spend much time looking at what ends up on the user drives. Finding music, photos and movie files on storage arrays is not uncommon.
- Create an email retention policy and enforce it. One relatively easy way to handle this is to implement an email archive tool such as Symantec Corp.'s Enterprise Vault. Symantec's Enterprise Vault allows you to archive messages, perform discovery when needed and also set a retention after which archives can be deleted. This product also works for file systems and Microsoft SharePoint data. Also, there are other email archive tools available, such as those from Informatica Corp., which are specifically designed for the archival of databases driven applications such as CRMs and ERPs.
- Beware of PST files (or personal email archive files), especially when trying to enforce email disposal policies. Many users have discovered that they can save email messages to PST files before they are automatically archived and eventually deleted. This practice can undermine data reduction efforts especially if users store these PST files on the corporate file server. Creation of PST files can also nullify a corporate email retention policy by allowing messages that are marked for deletion to actually remain accessible.
About this author: Pierre Dorion is the data center practice director and a senior consultant with Long View Systems Inc. in Phoenix, Ariz., specializing in the areas of business continuity and DR planning services and corporate data protection.
This was first published in March 2010