Problem solve Get help with specific problems with your technologies, process and projects.

Compression, deduplication and encryption: What's the difference?

Learn the distinction between compression, deduplication and encryption as these concepts are gaining importance in everyday storage.

What you will learn from this tip: Learn the distinction between compression, deduplication and encryption as these concepts are gaining importance in everyday storage.

Hard disks now rule the storage industry, and traditional storage arrays are routinely flanked by other disk-based systems like virtual tape libraries (VTL) and replication platforms. As more corporate data is relegated to spinning disk, storage administrators must implement, configure and manage this escalating capacity -- stretching disk space to the limit while protecting important data against loss or theft. Compression, deduplication and encryption are emerging in disk storage, and it's important to understand the role of each.

Data compression

Compression is actually a decades-old idea, but it's making a renewed appearance in storage systems like VTL. Compression basically attempts to reduce the size of a file by removing redundant data within the file. By making files smaller, less disk space is consumed, and more files can be stored on disk. For example, a 100 KB text file might be compressed to 52 KB by removing extra spaces or replacing long character strings with short representations. An algorithm recreates the original data when the file is read. Picture files are also usually compressed. For example, the .jpg image file format uses compression to eliminate redundant pixel data.

Virtually any file can be compressed, though files with non-redundant data may compress little (if at all), so compression ratios are a guideline -- not a rule. For example, a 2:1 compression ratio can ideally allow 400 GB worth of files on a 200 GB disk (or 200 GB worth of files would only take 100 GB on the disk). It's very difficult to determine exactly how much a file can be compressed until a compression algorithm is applied.

Data deduplication

A typical data center may be storing many copies of the same file. File deduplication (sometimes called data reduction or commonality factoring) is another space-saving technology intended to eliminate redundant (duplicate) files on a storage system. By saving only one instance of a file, disk space can be significantly reduced.

For example, suppose the same 10 MB PowerPoint presentation is stored in 10 folders for each sales associate or department. That's 100 MB of disk space consumed to maintain the same 10 MB file. File deduplication ensures that only one complete copy is saved to disk. Subsequent iterations of the file are only saved as references that point to the saved copy, so end-users still see their own files in place. Similarly, a storage system may retain 200 e-mails, each with a 1 MB attachment. With deduplication, the 200 MB needed to store each 1 MB attachment is reduced to just 1 MB for one iteration of the file.

Deduplication can also provide more granular control, removing redundant portions of files -- potentially down to the block level. This is common in content-addressed storage (CAS) systems like Avamar Technology Inc.'s Axion product. When evaluating a de-duplication product, it's important to understand the granularity offered by their platform.


With the increasing prominence in government regulations and corporate litigation, storage managers acknowledge the role of security in enterprise storage. Encryption is used to protect data, preventing unauthorized users from accessing information even if files are hacked and stolen. You can find encryption in secure systems like Nexsan Technologies' Assureon CAS appliance.

Encryption uses a mathematical algorithm with a unique key to encode a file into a form that cannot be read. No one else can access or use the encrypted file until it is unencrypted again using the identical key. Of course, if the encryption key is lost or forgotten, any data encrypted with that key will be rendered inaccessible.

Next Steps

Ten ways storage and backup administrators can save time and money?

EMC NetWorker backup best practices?

Data deduplication: The business case for dedupe

Data deduplication tutorial

Data destruction options for your backup data

This was last published in May 2006

Dig Deeper on Disk-based backup

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.