This is the second in a multi-part series discussing Symantec Corp. Veritas NetBackup best practices to help you make the right choices for your data backup system.
For more detail on how I define best practice," see my blog entry about best practices. The next NetBackup best practice to examine is to use the "max jobs per client" setting to limit client throughput in order to increase performance.
This best practice is based on the following idea. With some exceptions, how long all the backups take is more important than how long an individual client's backup takes. As long as each client's backup fits within its prescribed window, how much of that window it takes is irrelevant. And, for many systems, it's actually the recovery point objective (RPO) that matters, and how fast a client backs up often has little to do with how fast it restores. For one thing, backups happen while all kinds of other backups happen, but restores are often done while nothing else is going on. A restore often gets unlimited access to the restored system and the backup server and its resources.
You must first find out how fast an individual stream/job should be. To do this, you divide the target throughput number of the tape drive or disk storage unit by the number of streams required to create that throughput. For example, suppose that you've decided (via testing) that to stream a 50 MB/sec drive you need to use a multiplexing setting of 10. Each stream should then be around 5 MB/sec. Similarly, if you've got a disk storage unit behind a 50 MB/sec network connection, and you decide that it can accept 10 streams at a time, then each stream should be about 5 MB/sec.
Now compare that target stream rate (5 MB/sec in the above example) to the reliablestream rate of the client. If the client is on 100 Mb, and can only reliably send 5 MB/sec, and our target stream rate is 5 MB/sec, it should be configured to only backup one job at a time. If you do this for 10 100 Mb clients, you'll stand a much better chance of achieving our desired 50 MB/sec. If, on the other hand, you allow each client to start as many jobs as it has file systems (the default behavior), then you won't know what you'll get. For example, suppose your 10 100 Mb clients had two file systems apiece, and they were each allowed to follow the default behavior of backing up to 10 file systems at once. The result is that only five clients would backup with two file systems apiece. You've already established that a given 100 Mb client (in this example) can only reliably send 5 MB/sec. This means that instead of the 50 MB/sec you wanted, you'll now get 25 MB/sec. The result is that while this may speed up each individual backup, our tape drive is now running at 25 MB/sec instead of 50 MB/sec, which means that "the backups" will take twice as long to complete -- and (with exceptions) that's what you're trying to fix.
There are two arguments against this best practice, so let's take a look at them. The first argument is that many clients can often send more than what you throttle them at. For example, suppose you've tested that in our world, 100 Mb clients can reliably send 7 MB/sec, but our target stream rate is 5 MB/sec. If the goal is to maximize overall throughput, this client should be throttled at one job. This will make this client's job go slower, since he could run at 7 MB/sec.
First, as previously stated, I believe that (with exceptions) getting all the backups done as fast as possible is more important than getting an individual backup done as fast as possible. Second, by configuring things in this way, the tape drives will stream, making all backups go faster, which inevitably makes individual backups go faster. In addition, doing more than one job at a time also increases the load on the client and the server, slowing down things even more. It's actually better if everyone slows down a little bit; everyone goes faster if everyone goes slower. It's kind of like the "Nash Equilibrium" theory that John Nash invented. (Go see "A Beautiful Mind.") Everyone wins by doing what's right for the group -- and himself.
Another argument against this best practice is that it increases the complexity of the environment. For those who have not used this setting, it's really very simple. It's under Host Properties -> Master Server Properties -> Client Attributes. There you will find the "maximum number of jobs" settings. It can also be set or viewed with the bpclient command. Generally speaking, most people can use three settings: one for 100 Mb clients (e.g., 1), another for Gb clients (e.g., 5-10), and another for SAN storage nodes (10-plus). Clients whose backups must go as fast as possible should be given a network connection that's fast enough to handle the throughput it needs, and it might be given a little more leeway in the "max jobs" setting, but this generally isn't required. Other exceptions to the general rules above might be any clients whose backups are abnormally slow. If for some reason, you need 5 MB/sec per stream, and you have a client whose individual backups only run at 2 MB/sec, you might want to allow it to run two or three jobs simultaneously. Again, this would only be necessary if the throughput that you're getting for that client does not allow it to fit within its backup window -- otherwise leave it alone.
W. Curtis Preston (a.k.a. "Mr. Backup"), Executive Editor and Independent Backup Expert, has been singularly focused on data backup and recovery for more than 15 years. From starting as a backup admin at a $35 billion dollar credit card company to being one of the most sought-after consultants, writers and speakers in this space, it's hard to find someone more focused on recovering lost data. He is the webmaster of BackupCentral.com, the author of hundreds of articles, and the books "Backup and Recovery" and "Using SANs and NAS."
This was first published in July 2009