Published: 10 Nov 2008
| Backup is still the greatest pain point for storage managers. The following five vexing backup problems can become less onerous if you use these simple procedures to improve your backup performance and reliability.
Modern tape drives are designed to operate at their advertised speeds, and operating them at lower speeds is what causes them to fail more often; there's a minimum speed at which the tape must move past the head to achieve a good signal-to-noise ratio. Even variable speed tape drives have a minimum speed at which they can write data. LTO-4, for example, has a minimum native transfer rate of 23MB/sec. And while few users experience the 2:1 compression ratio advertised by drive manufacturers, whatever compression rate they do experience must be multiplied by the minimum transfer rate of the drive. For example, data that experiences a 1.5:1 compression ratio being sent to a tape drive with a minimum speed of 23MB/sec makes that drive's minimum transfer rate 34.5MB/sec (23 x 1.5).
Depending on which backup software you use, you can increase the speed of backups that go directly to tape with the following: LAN-free backups, multiplexing and not using additional tape drives until you've given the initially used tape drives enough throughput. The second (and simpler) solution is to stop using tape as your primary target for backups and instead back up directly to disk. Using disk as an intermediary staging device usually gets the initial backup done much faster, and then the local (LAN-free) movement of data from disk to tape can go much faster. These backup methods will keep the tape drives much happier, they'll fail less often and you can reduce the number of tape drives you'll need to buy to get the job done.
Consider the following two real-life stories: One day a backup administrator was asked to restore a set of files on server hpdbsvk. According to the firm's naming convention, this meant HP-UX database server "k." The backup administrator also knew that because servers were named in alphabetical order, there were also database servers hpdbsva through hpdbsvj, and he was only backing up servers hpdbsva through hpdbsvj. Immediately, he knew he had some work to do, but soon afterward someone walked into his office and asked him to restore a database on hpdbsvk. While the data was never restored, the administrator didn't lose his job and didn't even get in trouble. How is that possible?
Real-life story No. 2: One day an administrator was asked to restore some code sitting in /tmp on an HP-UX system. The file system had disappeared upon reboot because it was a RAM file system. The customer requesting the data was furious when he found out that the backup system didn't back up /tmp. Again, the administrator didn't lose their job or get in trouble. Why not?
In both cases, the reason the backup administrator didn't lose their job was the same: documentation. Back in the days before the Web, the backup system in question used a paper-based request form users had to fill out if they wanted a system backed up. The form included a line that read "Do not consider this request accepted until you receive a copy of it in your in-box signed by someone on the backup team."
In the case of the customer who requested a restore from hpdbsvk and started fuming because it wasn't being backed up, the backup administrator asked to see the form with his signature on it. The customer didn't have the form, so the issue became what I like to call a "YP not MP"--Your Problem, not My Problem--as far as the backup administrator was concerned. As for the /tmp situation, it was excluded from backups, and the exclusion had been approved by upper management and well-advertised. (After all, the "T" in tmp stands for temporary, so why would you back up temporary things?)
Applying the paper backup request system to today's Web-based world is simple. Create a backup system request Web page that notifies the user who requested the backup that the backup is being performed. If you're using a data protection management tool, the user who requests the backup can even be notified every time the backup succeeds or fails. How's that for customer service? The Web page should also list standard backup configurations, including things like what gets backed up (or not backed up) by default.
It's also important to mention how important it is to use your backup software's ability to automatically discover and back up all file systems or databases on a given machine. If your backup software has this feature, use it; don't attempt to manually list all file systems. You're just asking for trouble and an RPE when you discover that you forgot to add the F: drive on a particular server. If your backup app doesn't have this feature, get a new one.
As a result, backup administrators often don't notice if a given server, file system or database doesn't successfully back up for multiple days. Some environments where I've performed backup assessments have had servers that have gone several days--even as much as a month--without a successful full or incremental backup; and the larger the environment, the greater the problem. At one customer's site where they back up 10,000 systems, more than 1,000 systems went four days or more without a successful backup of any kind.
Servers that go several days without a backup are obviously at greater risk than others. If a backup administrator was aware of such a trend, they might do a number of things, such as cancel less important backups so that the server that hasn't backed up for several days can be given more resources. At a minimum, the storage admin may set the priorities on the backup system so that a server that hasn't backed up for several days is more important than other servers.
Most backup products don't provide the kind of tools necessary in their base product to see this kind of information. The solution is a relatively simple one, but not an inexpensive one: Buy a data protection management tool. There's a reason a whole industry has grown around such tools, and it's difficult to properly manage a backup system without one.
Backup administrators good at shell or batch scripting can create programs that help them automate certain tasks. One customer I visited had 150 custom scripts written around their backup system. The problem with this kind of customization is that it's hard to maintain and even harder to pass on to the next backup administrator. Administrators who create too many scripts may find themselves stuck as "the backup person" because no one wants to take on and maintain all of those custom scripts.
Another way customization manifests itself is in unique backup configurations. Instead of having a standard backup configuration for everyone, some environments create custom backup configurations for each customer that requests one. For example, "For this server, we're going to back up only the F: drive and we'll do it only on Thursday nights from 3:00 am to 4:00 am." Besides making things much more complex, this kind of customization also goes against the way most backup software is designed. Backup software is designed to share resources and automatically send things to the right resource as it becomes available and as priorities dictate. Unique backup configurations drastically reduce the overall utilization of all resources by not allowing the backup software to do its job.
Deviations from this standard must be justified by business reasons and approved by a business unit manager who will receive a chargeback for the extra cost involved in such customizations.
Regarding custom scripts, the best thing to do is to consult the forums and mailing lists for the backup software you're using to find out if anyone has discovered another way to meet your requirement without custom scripting. Software updates often fix such problems found in earlier versions, but people continue to use their old ways because it's what they know.
Finally, if the software you're using can't be made to do what you want it to do without all of those custom scripts, perhaps it's not the right backup software for you and another backup application would do what you need it to do out of the box. Although changing backup software packages should be considered a last resort, it may actually be the best thing in some cases.
There are two solutions to this problem. First and foremost, encrypt your backups. There are a number of ways to encrypt data, such as using backup software encryption and encryption engines built into fabric switches, tape libraries and disk drives. The second solution is to not ship tapes offsite but to use a disk-based deduplication backup system that replicates your backups offsite. If you still want to make tapes, make them at your offsite location.
In my opinion, anyone in management who refuses to fund the security of backups should be relieved of their duties, and very well could be if things go wrong. Make sure that person isn't you. If your company is shipping unencrypted backup tapes with personal information on them, you should immediately notify your superiors in writing of the seriousness of this problem and request a project to solve it. Document your request and the response, especially if it's a negative one. Continue to make yourself a pain until they solve the problem or give you another job; you don't want the job of enabling identity thieves.
In sum, while some of these solutions may be simpler than others, a lot of what you can do to make your backups better comes down to understanding the limitations of what you're using and knowing how to document and improve your backup processes. Sometimes it pays to spend money on specialized backup tools that provide a clearer view of your backup environment.