Maksym Yemelyanov - Fotolia
Compression is one of the data reduction methods that has been around the longest. In fact, the old PKZIP utility that was so popular back in the 1980s was based on the use of data compression technology.
Data compression continues in use today because it is both simple and effective. It works by scanning the bits that make up a file in an effort to locate long bit strings that are repeated more than once. If such bit strings are located, the file is rewritten with the long, repetitive bit strings replaced by much shorter strings. The end result is that, once compressed, the file consumes less storage space.
As well as data compression technology works, there are a few potential disadvantages to compression. First, data can only be compressed if it contains recurring bit strings. After all, it is those recurring strings that are eliminated in an effort to reduce the size of the file.
The problem is that many modern data types are already compressed. You might, for example, have heard the term compressed media used to refer to digital video or JPEG images. Such files contain very little -- if any -- redundancy, and therefore cannot be further compressed.
Another potential disadvantage of data compression technology is that it can be a CPU-intensive process, although there are offloading techniques that can be used for network compression. CPU cycles are consumed by the process of parsing files in search of redundant bit patterns.
Taneja Group analyst Mike Matchett explores the topic of compression and deduplication.
In most cases, the CPU overhead isn't problematic, but it is something to consider if processing is occurring on a system that is already CPU bound. This is especially true if the data is nonredundant and CPU cycles are being wasted trying to compress data that is already compressed.
Also, watch out for the potential of data loss. Data compression technology effectively removes good data from a file and replaces it with a marker. In most cases, this is not a problem. However, the removal of redundant bit strings opens the door to the potential for extreme corruption. A minor disk error that might normally result in a small problem for an uncompressed file could cause a compressed file to become completely unreadable.
Dig Deeper on Data reduction and deduplication
Related Q&A from Brien Posey
VMware's new Virtual Volumes allow virtual machines and applications to be better allocated to storage. The feature is now available in beta. Continue Reading
The intelligent data management platform helps organizations get more out of their data. Explore why this trend has taken off and three best ... Continue Reading
Compatibility, connection and cost are key considerations for hyper-converged data backup. Dive into these best practices to improve your data ... Continue Reading