There are a variety of methods you can use to back up your virtual machines running on your VMware ESX hosts. Each...
method has some pitfalls that you should be aware of. The following are some of the most common virtual machine backup problems.
Look out for errant snapshots
Many backup methods/applications like VMware Inc.'s VMware Consolidated Backup and Vizioncore Inc.'s vRanger Pro take a snapshot before backing up the virtual machine to halt writes to the virtual machine disk file. This is done so the data is not modified while the backup is occurring; a snapshot makes the virtual machine's original disk read-only while writing all new data to a separate delta disk file. Once the backup is completed, the delta file is merged back into the virtual machine's original disk file and the snapshot is deleted. Occasionally, the virtual machine snapshots that are taken are not deleted after the backup has completed because of application errors or other issues.
Snapshots that are left running can cause performance issues on the host server and can cause data loss if certain operations occur while a snapshot is running (i.e., increasing the size of a virtual disk). To protect against this, you should periodically look for running snapshots on all your ESX hosts and delete them if they are no longer needed. VirtualCenter does not have a good reporting tool to show all snapshots that exist on each ESX host server; the only way to do this in VirtualCenter is to go to each virtual machine individually and check. However, there are a number of good free snapshot reporting tools available that can provide information on running snapshots and even email reports on a scheduled basis.
Use application-consistent quiescing
If you use VMware Consolidated Backup to back up your virtual machines, consider upgrading to version 1.5 and also upgrading your ESX hosts to version 3.5 Update 2, which provides application-consistent quiescing for Windows virtual machines before they are backed up. Quiescing is a process that ensures that the disk data is in a state suitable for backups to reduce the possibility of data corruption upon restore. This is particularly important for transactional-based applications running on virtual machines like Microsoft SQL Server. There are different types of states that a VM can be in when backing it up; crash-consistent is like performing a restore of a virtual machine that had its power turned off while running which is the most prone to data corruption upon restore; file-system-consistent quiescing uses a driver (like the SYNC driver in VMware Tools) to freeze file-system I/O temporarily and flush dirty memory data to disk before the snapshot of the VM is taken, which reduces the risk of data corruption upon restore. Application-consistent quiescing takes it to the next level and works with VSS-aware applications to ensure that all data is written to disk before performing the snapshot, which also ensures the highest integrity of the data for performing restores.
When not to use the SYNC driver
Using the SYNC driver that comes with VMware Tools to halt I/O and flush dirty data to disk (I/O draining) to create file system-consistent backups may cause some applications that are sensitive to writing data in a timely manner to fail and generate errors. If this happens you can un-install the file system SYNC driver to avoid the delay that is caused by I/O draining. To uninstall the SYNC driver just un-install VMware Tools, reboot and then re-install it and make sure and select a "Custom" setup type. Next, expand VMware Device Drivers and click on "Filesystem Sync Driver" and select "This feature will not be available." Just be aware if you do this your snapshots will be crash-consistent instead of file-system-consistent. As an alternative, you can do custom quiescing through pre-backup and post-backup scripts that are run inside the guest operating system. Another alternative is to shutdown the virtual machine before taking the snapshot so the virtual machine is application-consistent.
Ensure VMware Tools is up-to-date
When using Consolidated Backup, it's important to make sure that VMware Tools is installed and up-to-date on every virtual machine. VMware Tools includes a file-system SYNC driver (and as of 3.5 Update 2 some additional VSS components), which are used to quiesce VMs prior to backing them up. When installing patches and updates to ESX hosts you will often have to upgrade VMware Tools on your VMs to make sure they are running the updated version after the host is updated. Each patch that is released will note whether a VMware Tools upgrade is required after the patch is installed; additionally, the VI Client will list the status of VMs with out-of-date VMware Tools as "ToolsOld." To enable this column in the Virtual Machine view simply right-click the column headings and put a checkmark by "VMware Tools Status."
Stagger your backups
When using traditional backup agents on your virtual machines, you will find that backing up multiple virtual machines at once on a host server can negatively affect the performance of all VMs on the host server. Using this method causes high network traffic on the vSwitch and host physical NIC that the VM is connected to. In addition, this causes high disk I/O on the host HBA/disk controller. If you use this method, make sure you're not backing up more then a single VM on a host server at the same time. Additionally, if you are backing up multiple host servers concurrently, then try and stagger them so you are not backing up VMs on the same VMFS volumes and RAID groups at the same time.
Schedule backups carefully
With traditional backup methods, you may also experience slow backup times if the host server's resources are constrained due to contention with other activity on the host servers. Try to schedule backups around any high resource time-specific activity that may be occurring on the host servers to avoid this. For example, if you have some virtual machines that run a specific high resource activity like a weekly payroll processing, you should schedule backups on the host that the virtual machine is located on around that activity. Also avoid backing up during scheduled antivirus scans and patching windows; this will slow down the host server and it will also slow down the backups.
Be aware of bottlenecks on your host servers
Not all bottlenecks are obvious and visible; you may not know you have a resource contention problem on your host servers while backing up your virtual machines until you look for it. Use virtualization-specific resource monitoring utilities to watch resource usage and I/O while backups are occurring. You can use built-in tools such as the CLI command ESXTOP or the VirtualCenter performance monitoring tab to monitor this. For even better reporting and monitoring, try one of the many third-party utilities that are available like Vizioncore's vFoglight or eG Innovation's VM Monitor. Once you are aware of bottlenecks you can schedule backups accordingly, or look at alternate backup methods.
About this author: Eric Siebert is a 25-year veteran of the IT world and has been specializing in virtualization for the last three years. He is a guru-status moderator in the VMware community VMTN forums and maintains VMware-land.com, a VI3 information website. He is also the author of a upcoming book tentatively titled "VMware VI3 Implementation and Administration" due out in April 2009.
Do you have comments on this tip? Let us know.
Please let others know how useful this tip was via the rating scale below. Do you know a helpful backup tip, timesaver or workaround? Email the editors to talk about writing for SearchDataBackup.com.