Systematic and repetitive processes are key to achieving smoothly running backup operations
An IT administrator I know has a catchphrase that runs "backups are a necessity, restores are a luxury." So it goes with a significant portion of the IT world where tape-based backups are run on a daily basis with little or no assurance that the data forming the lifeblood of the business is appropriately protected and recoverable. I've learned that to truly rate the operations excellence of an IT group, you seldom need look further than under the cover of their tape backup environment.
The simple fact is many IT administrators don't know how well their backup environment performs (see "Rate your backup operations" sidebar). The best they hope for is the backup completes in the allowable window and the truck shows up to carry the duplicate tape to some vault in the side of a mountain. After that, it's on to the hot issue of the day and backups are forgotten about until the next night.
There's little or no focus given to optimizing the effectiveness of the backup system and ensuring costs are appropriately contained.
So where would you start in verifying backup efficiency? With so many different things that can be checked and validated, it makes sense to segment the checks and balances into a series of checklists that can be exercised on a periodic basis. Here's a good basic to-do list:
Daily. Verify the successful completion of backups and take corrective action for failures. Many enterprise environments commonly have backup software from multiple vendors at work, which complicates the task, but products are emerging that will report across multiple platforms for success and failure. For reporting, backups should be segmented wherever possible by the applications or business unit that generates the data. With this information in hand, failures can be reported to the appropriate line of business managers to determine the most feasible way to move forward.
Weekly. Verify that the consumables - log file space or scratch tapes - of the backup environment are appropriately maintained.
Quarterly. Verify the components of the backup system are appropriately sized - library size, software configuration (number of master, media servers) - for the dynamic environment.
Yearly. Verify that the backup solution is appropriate for the environment and your planned 18 to 24 month outlook.
|Rate your backup operations|
Once a week, take some time to perform care and feeding of the backup system. Verify that there are appropriate quantities of scratch tapes available in the library, verify that the log file won't overrun its partition and check the log files for warnings and errors other than backup failures. Scripts can automate much of this, but it's amazing how many times backups don't run successfully because of a lack of available resources. Depending on the specific backup application being used, the specifics of what will need to be checked will vary, but institutionalize a weekly check on consumables.
With daily and weekly activities now in hand, it's worthwhile paying some attention to the components that comprise the backup environment. Conduct a sanity test to verify the tapes in the library are what the software expects. Check that the throughput rates are at expected levels. Perhaps the time has come to add an additional media server? What's the utilization level of the tapes in the library? Is that an efficient level? How could it be improved? Think about all the components in the environment and check that they continue to be appropriately sized and tuned.
This would also be a good time to perform some random restores to verify that things are performing as expected. Try restoring some data backed up on a tape currently in the library. At the same time, verify that both you and your off-site providers can hit the advertised SLA time frames for tape and data recovery. Try to restore something that no longer has a copy in the library and check how long it takes until the data is back online.
With those simple guidelines in place, a midmarket manufacturer was able to all but eliminate backup failures and reduce the quantity of tapes in their library by more than 25%.
Finally, set aside some time once a year or so to consider the overall backup solution. Are there technology improvements or IT projects coming in the next 12 to 18 months that make a shift in backup paradigm worthy of consideration? Whether it's a move from distributed to centralized computing or from tape-to disk-based backups, make some conscious decisions about overall backup strategy and review those decisions annually.
Backups are different from most other services IT groups provide because of the sticky nature of the backup cost. For every file written today, there are costs associated with tapes and off-site holding that can re-occur for weeks, months or even years. Files written today will be backed-up tonight, and each time there's a full backup until the tapes are cycled back for overwrite. They remain on those tapes, will be duplicated, will be sent off-site and retained for 12 months. All of those steps directly related to that one file, cost money. Because of this, ensuring that the environment is well managed around backups is an exercise that can both improve the quality of service to end users, but reduce costs to the enterprise.
Backup systems are like any other organism. As it grows and shrinks in response to the data, as forces act on it to cause it to become inefficient, it takes some care and attention to ensure that its ongoing efficiency and effectiveness are guaranteed. This is certainly a case where an ounce of prevention is easier than a pound of cure. As for the line about restores being a luxury - tell that to the CEO when the financial projections spreadsheet they deleted can't be recovered.