Maksym Yemelyanov - Fotolia
Compression is one of the data reduction methods that has been around the longest. In fact, the old PKZIP utility that was so popular back in the 1980s was based on the use of data compression technology.
Data compression continues in use today because it is both simple and effective. It works by scanning the bits that make up a file in an effort to locate long bit strings that are repeated more than once. If such bit strings are located, the file is rewritten with the long, repetitive bit strings replaced by much shorter strings. The end result is that, once compressed, the file consumes less storage space.
As well as data compression technology works, there are a few potential disadvantages to compression. First, data can only be compressed if it contains recurring bit strings. After all, it is those recurring strings that are eliminated in an effort to reduce the size of the file.
The problem is that many modern data types are already compressed. You might, for example, have heard the term compressed media used to refer to digital video or JPEG images. Such files contain very little -- if any -- redundancy, and therefore cannot be further compressed.
Another potential disadvantage of data compression technology is that it can be a CPU-intensive process, although there are offloading techniques that can be used for network compression. CPU cycles are consumed by the process of parsing files in search of redundant bit patterns.
Taneja Group analyst Mike Matchett explores the topic of compression and deduplication.
In most cases, the CPU overhead isn't problematic, but it is something to consider if processing is occurring on a system that is already CPU bound. This is especially true if the data is nonredundant and CPU cycles are being wasted trying to compress data that is already compressed.
Also, watch out for the potential of data loss. Data compression technology effectively removes good data from a file and replaces it with a marker. In most cases, this is not a problem. However, the removal of redundant bit strings opens the door to the potential for extreme corruption. A minor disk error that might normally result in a small problem for an uncompressed file could cause a compressed file to become completely unreadable.
Dig Deeper on Data reduction and deduplication
Related Q&A from Brien Posey
Quad-level NAND has its benefits, but is it right for enterprise-level companies? While it can be plagued by performance and durability issues, the ... Continue Reading
Disaster recovery requirements are constantly changing, and a DR plan must do the same. Frequent testing and clear communication can go a long way ... Continue Reading
While cloud DR is gaining popularity, it may not be the right choice for every organization. On-premises DR often comes with more predictable pricing... Continue Reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.