"By compressing the data, you're able to put more data on the tape in the same amount of time," says Jacob Farmer, CTO at Cambridge Computer Services, a Waltham, Mass.-bed consulting firm. "If you've got 2-1 compression, you not only double your tape capacity, you also get, theoretically, twice the performance."
Which is a good thing -- mostly.
One problem, Farmer says, is that the bottleneck in a backup system is seldom the I/O rate of the tape drives. More likely it's somewhere in the network feeding data to the drives. If the drives are trying to read the data faster than the balance of the system can write it, especially with linear tape drives, you'll get seriously degraded performance, he says.
Just where the performance threshold falls depends on how much compression you get on the data and that, in turn, depends critically on the data itself.
Compressing data is like putting stuff in a trash compactor, Farmer says. "If you put a milk carton in a trash compactor it gets good and small, but if you put a brick in there, it doesn't get smaller," Farmer says.
Although manufacturers typically quote a 2 to 1 gain from compression, Farmer says the actual number can vary from almost nothing to as much as 5 to 1 on databases with a lot of white space in them. The usual range is 1.5 to 2 to one, he says.
This is critical because the amount of compression determines how fast the balance of the backup system has to feed data to the tape drive to keep up. "If your drive normally starts shoeshining [if your data rate falls below] 10 or 11 mb/sec, that will be more like 22 mb/sec with 2 to 1 compression," Farmer says. "If you're getting 5 to 6 to one compression, you need to come up with 50 to 60 mb/sec of data."
The solution, Farmer says, is design. "You can't buy your way out of backup system problems," he says. "You have to design your way out. Backup systems aren't terribly complicated but if you gloss over them you don't get a working backup system for your money."
Another concern with using tape compression for backup is reliability. Ironically, Farmer says that is a non-issue with modern tape drives and good quality tapes. Although compressing data theoretically makes it more vulnerable to information loss, current systems are designed to provide more than enough reliability to prevent non-recoverable errors. "Tape compression is totally reliable and it doesn't add complexity," Farmer says.
For more information:Tip: Make tape technology work for you
Tip: Do the server-free two-step
Tip: Nine rules for better backups
Rick Cook has been writing about mass storage since the days when the term meant an 80K floppy disk. The computers he learned on used ferrite cores and magnetic drums. For the last twenty years he has been a freelance writer specializing in storage and other computer issues.
This was first published in April 2004