In spite of the latest wave of backup technology advances, data backup is still the outsider in terms of enterprise IT. But when the backup environment suffers a catastrophic outage, everyone (especially the CIO) wants to know who fell asleep at the switch. In the interest of preserving of your data -- and potentially your job -- here are five signs that your backup environment is going to get everyone's attention in the near future.
Throwing hardware at the problem.
If your capital expenditures (particularly for disk-based backup products) have skyrocketed to alleviate the pains of backup, don't hold your breath. Sure, tape is difficult because of the mechanical reality of moving parts and a sequential predisposition, but it's not the root of all evil in your backup environment. I'm a big fan of disk-based backup technologies, but regularly see marginally successful implementations of virtual tape libraries (VTLs). Why? Too many large-scale VTL implementations are driven by the notion that VTLs will solve every backup problem. This approach, coupled with the misconception that all disk is faster than all tape, usually results in "Generation-1 VTL" implementations, further complicating the already complex issues (client performance, network, backup servers, software implementations, you name it).
If you have no visibility into capacity, growth, success, failure or performance, you're unaware of the health of your backup environment. In many environments where backup runs "lights out," I often find staggering backup disk failure rates, ranging from 30% to 60% daily failures and massive capacity issues. Poor backup performance directly impacts your ability to recover data, so if you don't manage based on metrics, you will eventually run into the wall in terms of capacity, performance and successful data restores.
Incomplete offsite vaulting.
Is offsite vaulting often sacrificed in order to sustain daily backups? That's not a bad short-term survival strategy, but it's a great way to fail catastrophically in the long-run when offsite backup copies are outdated and irrelevant. Most backup environments play a primary and/or secondary role in disaster recovery (DR) situations, and in doing so, must vault current copies of backup data to an offsite location. Traditionally, this is accomplished via offsite tape vaulting, which requires a daily "copy" procedure by the backup application. If your offsite vaulting isn't successful on a daily basis, it means you're already behind the eight-ball, and if you don't fix it, you'll probably fall permanently behind. And when disaster strikes, everyone will wonder why the only data that can be recovered is two weeks old, and they'll all be looking at you.
Most backup environments have had many hands, yet is there one engineer who really understands the environment? By means unknown to the average man, this person manages to keep the backup environment stable and performing. These skills are in high demand, and eventually "the linchpin" leaves for a better opportunity (if the job doesn't drive him away). When critical resources depart, so will your institutional knowledge that keeps the environment running. In a matter of months, a perfectly healthy backup environment can deteriorate, due to lost institutional knowledge and practices.There are lots of opportunities to make mistakes with backup software, and it's guaranteed when the linchpin leaves.
Treating backup like a utility?
Granted, this is a philosophical position, but I continue to see world-class data centers supported by antiquated backup infrastructures. If backup is managed like a static utility (say antivirus), you're on the road to failure. Backup is a dynamic I/O machine, highly dependent on the performance, scale and health of network, storage, server, and software components. If you step back and look at the overall data center, no application compares in terms of I/O and architectural dependencies. Manage your backup solution as a core infrastructure solution, or plan on endless issues and reactive operations.
About the author: John Merryman is services director for recovery services at GlassHouse Technologies Inc., where he is responsible for service design and delivery worldwide. Merryman often serves as a subject matter expert in data protection, technology risk and information management related matters, including speaking engagements and publications in leading industry forums.
This was first published in March 2008