Compression is one of the data reduction methods that has been around the longest. In fact, the old PKZIP utility...
that was so popular back in the 1980s was based on the use of data compression technology.
Data compression continues in use today because it is both simple and effective. It works by scanning the bits that make up a file in an effort to locate long bit strings that are repeated more than once. If such bit strings are located, the file is rewritten with the long, repetitive bit strings replaced by much shorter strings. The end result is that, once compressed, the file consumes less storage space.
As well as data compression technology works, there are a few potential disadvantages to compression. First, data can only be compressed if it contains recurring bit strings. After all, it is those recurring strings that are eliminated in an effort to reduce the size of the file.
The problem is that many modern data types are already compressed. You might, for example, have heard the term compressed media used to refer to digital video or JPEG images. Such files contain very little -- if any -- redundancy, and therefore cannot be further compressed.
Another potential disadvantage of data compression technology is that it can be a CPU-intensive process, although there are offloading techniques that can be used for network compression. CPU cycles are consumed by the process of parsing files in search of redundant bit patterns.
Taneja Group analyst Mike Matchett explores the topic of compression and deduplication.
In most cases, the CPU overhead isn't problematic, but it is something to consider if processing is occurring on a system that is already CPU bound. This is especially true if the data is nonredundant and CPU cycles are being wasted trying to compress data that is already compressed.
Also, watch out for the potential of data loss. Data compression technology effectively removes good data from a file and replaces it with a marker. In most cases, this is not a problem. However, the removal of redundant bit strings opens the door to the potential for extreme corruption. A minor disk error that might normally result in a small problem for an uncompressed file could cause a compressed file to become completely unreadable.
Dig Deeper on Data reduction and deduplication
Related Q&A from Brien Posey
Unpredictable user behavior and boot storms can cause VDI resource usage fluctuations throughout the day. IT can take steps to identify and curtail ... Continue Reading
Like a car or any other important machine or device, backup architecture should have a maintenance plan. This process will help ensure the health and... Continue Reading
Expert Brien Posey breaks down why flash storage may not be the best choice for medical imaging archiving in terms of speed and cost, and how it may ... Continue Reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.