One of the big challenges associated with modern data protection is the administrative overhead. At one time, data protection simply meant using backup software to make copies of server contents. Today, things are not quite so simple. Backup administrators are faced with protecting massive amounts of data on physical and virtual servers. Also, in many cases, organizations may have very limited tolerance for downtime. Furthermore, resources might reside on-premises, in the cloud or both.
The end result has been that backups have become far more complex than they once were. Even a medium-sized organization might require dozens of carefully choreographed backup jobs in order to protect its various corporate resources. Needless to say, this increased complexity has led to a much greater administrative workload. Fortunately, there are some things that can be done to make a backup administrator's job easier.
Large data set protection
One of the major challenges is protecting data sets that are growing exponentially and that must be retained for an extended period of time. It is unrealistic to expect to be able to perform nightly backups of exponentially growing data sets, because sooner or later the backup window will become inadequate. Fortunately, there are some things that can help.
One of the best choices for protecting large data sets is a technology known as continuous data protection (CDP). The idea behind CDP is that it is more effective to back up data on an ongoing and continuous basis rather than to attempt to perform a monolithic backup each night.
Contrary to its name, CDP does not typically offer real-time protection of data (although there are exceptions). Mainstream CDP products work by keeping track of storage blocks that have changed, then copying modified storage blocks to a backup server on a scheduled basis. The frequency with which these copy operations can occur varies depending upon the software that is being used. However, it is not uncommon for CDP products to offer the ability to synchronize backup data every five minutes.
This approach works really well for a few different reasons. First, it completely eliminates the backup window. Rather than trying to shoehorn a backup job into an allotted period of time, the backup operation runs constantly throughout the day.
Another benefit to this approach is that data is protected on a much more frequent basis. If a major outage should occur, the amount of data that could potentially be lost would be limited to whatever had accumulated since the last copy operation, which should have occurred at most, just a few minutes before. In contrast, nightly backup jobs have the potential for many hours' worth of data to be lost because backups are being made infrequently (at least when compared to CDP).
The backup window is not the only challenge that must be addressed with regard to an exponentially growing data set. Media capacity is also a challenge. After all, the backup media must have sufficient capacity to store a copy of all of the data that needs to be protected.
In some ways, tape backup offers a theoretically unlimited backup capacity because when a tape fills up, an administrator can always begin writing data to another tape. The problem with this, however, is that most modern data protection choices (including CDP) are based on disk rather than on tape. Disk tends to offer a higher level of performance than tape because it is a random-access medium, which means that data can be written or retrieved on an as-needed basis. In contrast, tape is a linear medium, which means that the tape drive must move the tape to the correct position before any data can be read or written.
The problem with disk-based backups is that disks have a limited storage capacity. You cannot easily swap out a disk just because it fills up. That being the case, backup vendors have developed products to deal with the capacity limitations.
One such solution is data deduplication. Most modern backup products work at the block level, rather than the file level. Data deduplication mechanisms are designed to watch for redundant storage blocks. Although deduplication can be performed in a number of different ways, the end result is that duplicate storage blocks are not stored (or are stored only for a short period of time), thereby helping the storage medium to be used as efficiently as possible.
Another way that some organizations are coping with storage capacity limitations is through the use of tiered storage. The idea behind this approach is that aging backup data can be automatically moved to a less-expensive storage medium. This might involve dumping old data to tape, or it could mean moving aging data to commodity storage or even to cloud storage. Incidentally, similar mechanisms are also sometimes used to create a redundant copy of the backup either in a remote data center or in the cloud. This could be done manually but would be labor-intensive. Companies typically use software to automate the process based on policies that suit their needs.
Check out our entire data protection primer on identifying data backup solutions for today's challenges.
This was first published in April 2014