Maksym Yemelyanov - Fotolia
Compression is one of the data reduction methods that has been around the longest. In fact, the old PKZIP utility that was so popular back in the 1980s was based on the use of data compression technology.
Data compression continues in use today because it is both simple and effective. It works by scanning the bits that make up a file in an effort to locate long bit strings that are repeated more than once. If such bit strings are located, the file is rewritten with the long, repetitive bit strings replaced by much shorter strings. The end result is that, once compressed, the file consumes less storage space.
As well as data compression technology works, there are a few potential disadvantages to compression. First, data can only be compressed if it contains recurring bit strings. After all, it is those recurring strings that are eliminated in an effort to reduce the size of the file.
The problem is that many modern data types are already compressed. You might, for example, have heard the term compressed media used to refer to digital video or JPEG images. Such files contain very little -- if any -- redundancy, and therefore cannot be further compressed.
Another potential disadvantage of data compression technology is that it can be a CPU-intensive process, although there are offloading techniques that can be used for network compression. CPU cycles are consumed by the process of parsing files in search of redundant bit patterns.
Taneja Group analyst Mike Matchett explores the topic of compression and deduplication.
In most cases, the CPU overhead isn't problematic, but it is something to consider if processing is occurring on a system that is already CPU bound. This is especially true if the data is nonredundant and CPU cycles are being wasted trying to compress data that is already compressed.
Also, watch out for the potential of data loss. Data compression technology effectively removes good data from a file and replaces it with a marker. In most cases, this is not a problem. However, the removal of redundant bit strings opens the door to the potential for extreme corruption. A minor disk error that might normally result in a small problem for an uncompressed file could cause a compressed file to become completely unreadable.
Dig Deeper on Data reduction and deduplication
Related Q&A from Brien Posey
Hyper-converged systems, like any other, require data protection. We describe using RAID and erasure coding for hyper-convergence to help you pick ... Continue Reading
Organizations that need to protect resources in the public cloud have a number of backup possibilities, some with more hazardous negatives than ... Continue Reading
There may be value in including artificial intelligence in backup software. Find out which vendors are already using AI to improve backup and data ... Continue Reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.