Published: 20 Sep 2007
Now that you've optimized your physical servers, shouldn't you optimize your VMware backup?
The popularity of server virtualization--and VMware specifically--remains unabated. Not only is the VMware platform transforming data center management through server consolidation and improvements in business continuity, but it's "breaking" a few things along the way, including data protection strategies.
Many of the incumbent data protection solutions for physical environments are being applied to new virtual infrastructures. However, as pervasive as traditional backup/recovery solutions are in the physical world, IT groups contemplating what constitutes a platform shift to VMware may be ready to consider displacing traditional backup solutions in favor of next-generation data protection technologies, i.e., software that employs capacity-reduction techniques, such as deduplication, in the server-side backup process. Having already demonstrated a willingness to leave behind the traditional environment for a virtualized one, could the door be open for next-generation backup?
Platform shift to VMware
One of the many considerations when making a platform switch such as this is business continuity. Many features of VMware make it business resilient, but IT organizations still need to think about recovery of virtual machine instances and VMware ESX Server systems.
It might be helpful to first understand the basic components of VMware and what has to be protected. The physical system running ESX Server has a pool of resources--disk, CPU, network interface card and memory--shared by multiple virtual machines, each with its own independent OS and applications. The ESX Server, also known as the service console, and VMware's clustered file system, VMFS, leverage shared storage or internal disk where the virtual machine images are stored in a special format with a .vmdk file extension.
Data protection challenges with VMware
Server virtualization has increased the amount of data normally kept on a server. Virtual machines share physical system resources (to deliver more efficiency). However, those physical resources are finite and backup processes are hogs when it comes to I/O and network resources, potentially affecting operations on other virtual machines sharing the same system resources and impacting the backup window. So backups need to be designed with these implications in mind.
The special .vmdk files that store each virtual machine can't be targeted directly for backup without first quiescing the virtual machine image. When the virtual machine is running, the disk file is open and being written to by the virtual machine. Powering off the virtual machine for backup makes it safe to back up the .vmdk file; however, the virtual machine is suspended, as are all activities in the virtual machine. As an alternative to powering down, ESX Server includes the vcbMounter command line utility that creates a consistent snapshot of the virtual machine and exports the snapshot to files that any backup solution can target for backup.
There are two ways to back up virtual systems: file-level and system-level backup/recovery. With file-level backup/recovery, the main concern is with the files within ESX Servers and the virtual machines, whereas system-level backup/recovery includes the entire ESX Server or virtual machine. With each approach, you must consider the recovery trade-offs.
With file-level backups, you get file-level restoration. Recovery of a single file is faster and easier than recovering from a system-level backup. System-level backups (i.e., backup of the .vmdk file and the .vmx configuration file) allow for a complete virtual machine restore similar to bare-metal recovery. Single file recovery from a system-level backup could involve a two-step restore: recovery to an alternate virtual machine and then recovery of the individual file.
Backup options and deduplication
There are many approaches that can be taken when backing up VMware environments:
Backup agent in each virtual machine. Backing up at the guest OS level (the equivalent of how backup is done in the "physical world") is simple and guarantees consistency. This method supports full and incremental backups, as well as application-specific backups. There are a few disadvantages:
- This method lacks bare-metal recovery options, so a virtual machine can't be restored as a whole.
- It's burdensome on the host's shared resources.
- It's necessary to set up backup scheduling and policies for each virtual machine.
Client-based deduplication provides a possible answer to the burden placed on system resources. Because a full backup requires data to be read and pushed out to the backup engine, deduplicating data within a virtual machine and across virtual machines will significantly reduce the strain on shared resources and applications, as well as the amount of data copied and stored.
Integration with VMware Consolidated Backup (VCB). VCB, new in VMware Infrastructure 3, is VMware's answer to consistent backups without the need to install an agent in each virtual machine. The traditional backup agent resides on a proxy server and communicates with VCB. At a scheduled time dictated by the backup app, the backup agent instructs VCB to initiate a backup. The OS is quiesced and VCB takes a snapshot of the data, which is then copied to the backup proxy server. The backup agent communicates with the traditional backup engine to write the copy to disk or tape.
Using VCB to back up at the virtual machine level provides file-level recovery; eliminates the backup window by offloading the backup process to the proxy server; removes backup traffic from the LAN; doesn't require a backup agent in the virtual machine; and allows for full, incremental and differential strategies. Disadvantages include:
- Windows-only support (file-level backup and incremental strategies aren't available in non-Windows guest OSes)
- Needs a Windows-based proxy server (connected to networked storage with access to the LUNs where data is stored)
- Requires integration scripts or VCB-aware client agents (where integration scripting is hardwired) from the backup vendor to control the multistep VCB process
- Doesn't support enterprise apps such as Exchange, Oracle and SQL Server
VCB performs a live backup of virtual machine files, which appear as new files every day. Deduplication reduces the burden on the backup proxy server by eliminating unnecessary workload and data via subfile incremental backup.
Direct backup of .vmdk files. The .vmdk file can be backed up via a backup client agent installed in the VMware service console. This offers the advantage of virtual machine backup and recovery in one step. The disadvantages are:
- No file-level recovery
- The virtual machine requires shutdown and restart
- The virtual machine will be suspended when the .vmdk backup occurs
Deduplication will optimize this backup method as the .vmdk file breaks "incremental" policies of backup apps. Because the .vmdk file appears as a new file, it will be selected for incremental backup even if only a small piece of data changed. Solutions that deduplicate at a granular level make up for this inefficiency and back up only the changed data.
Licensed to back up
With any of these backup options, consider how the backup app is licensed. Typically, traditional backup software is licensed on a per-server and/or per-client basis. When a client agent is required on multiple virtual machines on a physical system, it could get costly. Many VMware-friendly licensing models, including capacity- and CPU-based licensing, are available in traditional and next-generation backup solutions.
Some of the smaller, more VMware-focused backup vendors are offering innovative solutions and capturing early-adopter VMware customers. In non-VMware environments, adoption of next-generation backup has been somewhat slow and marginalized, largely because the cost, upheaval and risk of switching backup solutions are too great for many firms. However, the application of deduplication in VMware data protection might be the catalyst to change this. The benefits of next-generation data protection in virtual environments are significant and (combined with IT re-architecting anyway) should overcome any potential resistance and encourage widespread adoption of this technology.