If you're feeling backup pain as the result of server virtualization, you're not alone. Many organizations are seeing increasingly higher backup workloads on their disk-based backup appliances as the result of virtual machine (VM) proliferation. In some cases, this is resulting in longer backup and recovery windows. Fortunately, there are ways of working around this problem.
Backup I/O workload mismatch
First, let's examine why server virtualization is breaking disk backup. As I discussed in a recent webinar, unlike primary disk storage architectures, which are designed to handle random read/write I/O activity, disk-based appliances are typically designed to handle sequential backup workloads. In addition, they are specifically optimized to handle large block transfers. Backup software products that are designed specifically for virtual backups, however, change how data flows to the disk backup appliance. By utilizing deduplication and change block tracking (CBT), for example, these systems send very small data block transfers to the disk appliance.
Since CBT shortens the backup window, it increases the user's appetite to take more frequent backups over the course of the day -- the more backups, the less risk of data loss. However, when there are dozens or potentially hundreds of VMs being backed up via CBT, this results in multiple, small block transfers going to the disk backup appliance at the same time. As a result, the backup transfer begins to look very similar to a production workload. Since these systems are tuned to receive large, sequential block I/O, this mismatch can wreak havoc on the appliance, resulting in poor backup performance. In short, disk backup appliances are as inappropriate for managing this kind of backup workload as they are for hosting production application data.
From SSD to HDD speed
Another issue is that some products for virtual backups now offer the ability to perform recovery in place. This means that backup data on disk can be used as a mount point for a VM. So, rather than waiting for data to be recovered from the backup, the VM can point to the backup image and resume normal operation. Sounds like a nice feature, but when you consider that most disk backup appliances are configured with high-density, slow-rotational disk drives, it could be a recipe for failure.
Many organizations are using flash and SSD in their primary storage systems so an application user potentially could go from solid-state speeds to the speed of a 1-terabyte, 5400 rpm drive. To further compound this issue, if the disk backup system is also constantly running data deduplication processes, every time the VM performs a write operation, the I/O will have to be deduped before it is committed to disk, further impacting performance.
Tiered backup capacity
To reduce or eliminate the impact of server virtualization on your disk backup infrastructure, there are several options you can take. First, consider front-ending deduplication backup appliances with non-deduped disk. This disk "staging area" could hold seven to 14 days of backup data to enable fast backups and fast recoveries. Since this is a relatively small storage area, it can also be configured with smaller-capacity, higher-speed drives. Doing this can help offset the additional I/O demands being placed on the appliance by VM CBT jobs or even multiple daily full or incremental backups.
Furthermore, this staging area would serve as a more viable storage tier for performing recovery-in-place operations because application storage I/O will not be encumbered by the added latency of deduplication processes.
Any data that needs to be retained beyond a seven- to 14-day window can then be migrated to a deduplicated storage area for long-term retention (30 days to 90 days). Anything that needs to be retained beyond a 90-day window can be archived to an efficient tape archive repository. In fact, there are tape-based products available now that enable users to build a hybrid tape cloud -- a local Linear Tape-Open, or LTO, repository that integrates with cloud-based tape archives.
A different way of approaching this issue is actually looking at the data under management. Many of the analysts and storage vendors in the industry today say that the vast majority of net new information (upwards of 80% of all data, by some estimates) being created is unstructured -- user files, email, PDFs, rich multimedia, machine sensor data, etc. Other than perhaps user files, most of this information typically doesn't dedupe very well -- at most, you may see a 2:1 reduction on deduped storage systems.
Object storage technology, on the other hand, is ideally suited for managing unstructured data since it is very elastic (theoretically, its ability to scale is unlimited); it can be used in private and hybrid cloud infrastructure; and it has embedded data protection attributes that, in effect, don't require the data to be backed up.
Object store offload
By utilizing object storage technology, it is possible to dramatically reduce the amount of data that needs to be backed up each night. This is due to the fact that many object storage systems can disburse data objects very efficiently across multiple disk enclosures for local redundancy and replicate data to object stores in secondary data centers or public clouds for disaster-recovery purposes. In fact, the backup process could be reduced, for example, to protect only VM images, user files, email and database data.
When you talk to your backup supplier, ask about all of these options. Many vendors provide multiple backup offerings -- disk-based deduplication systems, object storage, tape, etc. -- and can help you design an appropriate architecture. It is important to understand what your data profile is, which types of recoverability features you need, and how you can best scale your data protection footprint to meet the needs of the business.
How virtual machine replication to the cloud is changing data protection
What to buy: Comparing products for virtual backups
Is backup modernization the end of legacy backup?