Tip

Compression, deduplication and encryption: What's the difference?

What you will learn from this tip: Learn the distinction between compression, deduplication and encryption as these concepts are gaining importance in everyday storage.

Hard disks now rule the storage industry, and traditional storage arrays are routinely flanked by other disk-based systems like

Requires Free Membership to View

virtual tape libraries (VTL) and replication platforms. As more corporate data is relegated to spinning disk, storage administrators must implement, configure and manage this escalating capacity -- stretching disk space to the limit while protecting important data against loss or theft. Compression, deduplication and encryption are emerging in disk storage, and it's important to understand the role of each.

Data compression

Compression is actually a decades-old idea, but it's making a renewed appearance in storage systems like VTL. Compression basically attempts to reduce the size of a file by removing redundant data within the file. By making files smaller, less disk space is consumed, and more files can be stored on disk. For example, a 100 KB text file might be compressed to 52 KB by removing extra spaces or replacing long character strings with short representations. An algorithm recreates the original data when the file is read. Picture files are also usually compressed. For example, the .jpg image file format uses compression to eliminate redundant pixel data.

Virtually any file can be compressed, though files with non-redundant data may compress little (if at all), so compression ratios are a guideline -- not a rule. For example, a 2:1 compression ratio can ideally allow 400 GB worth of files on a 200 GB disk (or 200 GB worth of files would only take 100 GB on the disk). It's very difficult to determine exactly how much a file can be compressed until a compression algorithm is applied.

Data deduplication

A typical data center may be storing many copies of the same file. File deduplication (sometimes called data reduction or commonality factoring) is another space-saving technology intended to eliminate redundant (duplicate) files on a storage system. By saving only one instance of a file, disk space can be significantly reduced.

For example, suppose the same 10 MB PowerPoint presentation is stored in 10 folders for each sales associate or department. That's 100 MB of disk space consumed to maintain the same 10 MB file. File deduplication ensures that only one complete copy is saved to disk. Subsequent iterations of the file are only saved as references that point to the saved copy, so end-users still see their own files in place. Similarly, a storage system may retain 200 e-mails, each with a 1 MB attachment. With deduplication, the 200 MB needed to store each 1 MB attachment is reduced to just 1 MB for one iteration of the file.

Deduplication can also provide more granular control, removing redundant portions of files -- potentially down to the block level. This is common in content-addressed storage (CAS) systems like Avamar Technology Inc.'s Axion product. When evaluating a de-duplication product, it's important to understand the granularity offered by their platform.

Encryption

With the increasing prominence in government regulations and corporate litigation, storage managers acknowledge the role of security in enterprise storage. Encryption is used to protect data, preventing unauthorized users from accessing information even if files are hacked and stolen. You can find encryption in secure systems like Nexsan Technologies' Assureon CAS appliance.

Encryption uses a mathematical algorithm with a unique key to encode a file into a form that cannot be read. No one else can access or use the encrypted file until it is unencrypted again using the identical key. Of course, if the encryption key is lost or forgotten, any data encrypted with that key will be rendered inaccessible.

This was first published in May 2006

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.