This article will look at how data backup applications that were originally developed to back up physical systems and how they have adapted to support virtual server enviornments. You'll also learn about data backup applications that were developed specifically for virtualization, and additional methods that are available for backing up virtual machines (VMs).
DATA BACKUP APPLICATIONS FOR VIRTUAL SERVER ENVIORNMENTS
Traditional data backup methods usually operate inside the operating system where a backup device communicates over the network with a backup agent running on the OS to back up the contents of the disks on the server. This method worked well for physical servers, but virtual servers are much different than physical servers. For one thing, the entire contents of a guest operating system on a virtual machine is encapsulated into a single virtual disk file located on a host server's file system. In addition, a virtual machine must contend for host resources with many other VMs and each VM can have an impact on the performance of other VMs running on the host.
As a result, backing up a virtual machine using a backup agent running inside the OS is not very efficient because it increases network and disk I/O, as well as CPU utilization on the host while the backup is running. This results in fewer resources for the other VMs on that host. Additionally, if multiple backups are running on the host, the problem will be even worse and can seriously degrade the performance of the host.
Backups that use OS agents on virtual machines must navigate through the virtualization layer to get to the guest operating system layer. A more efficient way of doing backups in virtualization is to perform the backup at the virtualization layer and never enter the guest operating system. VMware recognized this and released its VMware Consolidated Backup (VCB). VMware Consolidated Backup acted as a proxy server to offload the backups from within the virtual machine by mounting the virtual disk on the VMware Consolidated Backup server and then doing an image-level backup of it without involving the host or the virtual machine. This shifted the backup overhead from the VM and the host to the proxy server instead. While this was a step in the right direction, it required a middleman between the backup device and the target disk.
With the vSphere release, VMware eliminated VMware Consolidated Backup and the proxy that was used and instead leveraged APIs and software development kits (SDKs) so data backup vendors could directly connect to virtual storage targets to back up VMs. The new vStorage APIs for Data Protection include the functionality that was available in VCB and also added new functionality such as Changed Block Tracking (CBT), and the ability to directly interact with the contents of virtual disks. Doing this was much more efficient and offered more features than using VCB to back up virtual machines.
Backing up a virtual machine at the virtualization layer involves backing up the virtual machine disk file (VMDK) that is added to a virtual machine. This is referred to as an image-level backup. This differs from traditional data backups that are done inside the OS where files are backed up individually (file-level backup). While it may seem more efficient backing up just one large file instead of thousands of smaller files, this is not the case.
The reason is that since image-level backups cannot see inside the operating system and are backing up the whole virtual disk. They are also backing up empty disk blocks and deleted files. If a virtual machine has a 50 GB virtual disk file and only 10 GB is in use, 50 GB is backed up with an image-level backup. With file-level backups, only the 10 GB in use is backed up. To get around this, backup vendors have gotten creative and rely on technologies like data deduplication, synthetic backups and empty block recognition that ignore empty and duplicate blocks as well as blocks that are no longer active when files are deleted.
You might wonder that if an image-level backup cannot see inside a guest operating system, how can it handle open files and avoid corruption from files that change when the backup occurs. This is done by first quiescing the VM using a special driver (either VMware Tools or a backup vendor-supplied driver) that runs inside the guest operating system that momentarily pauses the running processes on a guest and forces the operating systems and applications to write any pending data to disk. Once that is complete, a snapshot of the VM is taken at the virtualization layer that creates a new temporary virtual disk file (delta) for any new disk writes that occur on the VM, which prevents the original disk from being written to while the backup is running. Once the backup is completed, the temporary virtual disk file is merged back into the original disk file and the snapshot is deleted.
Many data backup application vendors that back up at the virtualization layer use deduplication to detect duplicate blocks and ignore them. They also detect empty disk blocks that have not been written to the operating system yet and ignore those as well. Vizioncore has taken it a step further and uses a technology called Active Block Mapping (ABM) to recognize disk blocks that once contained data but no longer do because files were deleted. Normally when a file is deleted within an operating system the pointer to the file is only removed but the data still resides on the hard disk.
A common misconception with image-level backups is that you cannot do incremental backups because you are only backing up one large file, and if any disk block changes the whole file must be backed up again. With traditional file-level backups, only the files that have changed are backed up on incremental backups. This is noted by setting a flag called an archive bit that indicates a file has changed since the last backup. Once the file is backed up, the archive bit is cleared until it changes again. With image-level backup, a backup application has to keep track of all the blocks that have changed since the last backup so they know which ones to back up when doing incremental backups. This process can increase the time of backups because the backup application must calculate a hash for each block, scan the entire virtual disk, and compare it against a hash table to see what has changed since the last backup. To speed up incremental backups, most backup vendors have taken advantage of the new Changed Block Tracking feature accessible via the vStorage API. This allows the backup application to simply query the VMkernel to find out which disk blocks have changed since the last backup, and this greatly speeds up incremental backups.
Another common misconception when doing image-level backups is that you cannot do individual file restores. This is possible, but doing image-level backups has one drawback: Since you are only backing up the large virtual disk file, it changes the way individual file restores are done. Traditional file-level backups simply create a catalog of all the files as they are backed up that is used so they can be restored later on.
Image-level backups also have this capability because they simply mount the virtual disk file that is backed up, and look inside the guest operating system to see the file layout. As a result, individual file restores with image-level backups are possible. With file-level backups, individual file restores are simple. You choose the file to restore from the backup media, and the backup server connects to the agent as the target server locates the file and copies it back to the original source. With image backups, because there is typically no agent on the target server, it's slightly different. What happens is the virtual machine disk file from the backup media is mounted by a restore application that allows the file to be copied from it to either a local disk or back to the original server. Then, once the file is copied the virtual disk is un-mounted. The process for individual file restores is different but the end result is the same.
While image-level backups may change the way you do file-level restores, it has the advantage of making a bare-metal restore of a VM a simple process. Since the VM is encapsulated into one big file, all you have to do is copy that file back to a virtual host and you have a complete copy of the server from the point in time of the backup. Another big advantage of this is that virtual machines all have the same type of virtual hardware regardless of the underlying physical host hardware. This eliminates any hardware incompatibilities that may occur when performing a bare-metal restore to a different host. With traditional backups, if you restore do a bare-metal restore to a different server you have to do a lot of pre- and post-restore steps to make sure hardware drivers, disk partitions and system configurations are all correct for the new hardware.
Image-level backups offer some other advantages over traditional file-level backups of physical servers. Having one the server encapsulated into one big file makes for easy portability; the virtual disk file can easily be copied to any other storage device. For example, one could easily copy a VM from a host server to another storage device, external hard drive or flash drive for safekeeping as outlined here. This makes creating ad hoc backups of virtual machines a simple process.
While having good data backups is very important, having good data backup restores is even more important. Backups are worthless if you cannot properly restore files when necessary. Testing restores with traditional backups of physical servers can be difficult, time-consuming, disruptive and requires extra server hardware. Virtualization can make this a much simpler process because virtual machines can be restored and isolated on hosts without overwriting and affecting the original virtual machines. This makes testing a restoration of individual files or whole virtual machines an easy process so you can verify that your backups are working properly.
Veeam Software has taken this even further by introducing a new feature called SureBackup. SureBackup automates the verification of virtual machines in a separate environment on a host so data and applications can be verified that they will function when restored. Normally to do this you have to copy the virtual disk files to a host so you can power on the VM and test it. Veeam figured out a way to avoid these extra steps by being able to run a VM directly from the target backup store without having to extract it. SureBackup publishes the contents of a backup file as a datastore that a virtual host can connect to. Virtual machines are automatically created from the datastore in an isolated environment where they can be powered on and tested to ensure that applications are functioning properly and data is intact. Using this method reduces the host resources that are required and does not require the extra storage that you would normally need to copy a virtual disk back to a host server. This capability automates and simplifies the verification of backed up VMs and also makes application-item level restorations possible.
While you can still use file-level agent backups running inside the VM, it is not as efficient and you should consider changing your backup methods that one that is optimized for virtualization. Traditional backup vendors like EMC, IBM and Symantec all have adapted their products to better integrate with virtualization.
In addition, there are several vendors that have developed data backup applications specifically for virtual environments including Veeam Software Backup & Replication, PHD Virtual esXpress and Vizioncore vRanger Pro Data Protection Platform (DPP). These vendors recognized the need for better backup solutions for virtual environments and developed products that are optimized for virtualization. When backing up your virtual environment, you should avoid using traditional methods and applications that are not aware of the virtualization layer. Instead, leverage ones designed specifically for virtual server environments so you can achieve maximum efficiency and flexibility with your backups and restorations.
About this author: Eric Siebert is an IT industry veteran with over 25 years experience covering many different areas but focusing on server administration and virtualization. He is a very active member in the VMware Vmtn support forums and has obtained the elite Guru status by helping others with their own problems and challenges. He is also a Vmtn user moderator and maintains his own VMware VI3 information website, vSphere-land. In addition, he is a regular blogger and feature article contributor on TechTarget's SearchServerVirtualization and SearchVMware websites.
This was first published in June 2010