Managing and protecting all enterprise data


Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Top new features of backup apps

We look at some of the key backup technology advancements and describe how four leading backup vendors--CommVault, EMC, IBM and Symantec--have implemented these technologies.

Backup applications have evolved over the last few years to incorporate features that were previously available only in third-party products.

New backup technologies are ready for mass adoption, and they're not just for early adopters. Early adopters helped give these technologies a jumpstart because they were comfortable purchasing products from startups and didn't think twice about being the first company on the block to try something new. But pioneers are typically a small contingent with many more potential users choosing a "wait and see" approach. So, even as some of these newer products achieve technological acclaim, they may barely make a dent in the overall backup market.

But recent events have accelerated the adoption -- and perceived maturity -- of some backup technologies. Smaller vendors have been acquired by their bigger brethren, and enabling technologies have emerged that ease the implementation of these products.

We'll look at five key backup technology advancements:

  • Data deduplication
  • Data protection management
  • Continuous data protection
  • Synthetic backups
  • Virtual server backup

We'll explain how these technologies have changed backup, and describe how four of the leading backup software vendors -- CommVault Systems Inc., EMC Corp., IBM Corp. and Symantec Corp. -- have implemented these new backup technologies. That's not to suggest that other backup vendors don't also offer these features, but these four are among the acknowledged backup software market leaders with the products readers ask about most often.

Data deduplication: Disk backup game changer

It's hard to overemphasize the importance of data deduplication in today's backup systems. It's perhaps the biggest game changer since the introduction of network backup systems 15 years ago, and its popularity can be traced to a number of factors. First, data deduplication enables users to increase disk utilization in their backup system. Tape had always been significantly cheaper than disk as a target for backups, and while the cost of disk has decreased significantly in the last several years, so has the cost of tape. So disk was typically used just as a staging mechanism for tape, rather than for long-term backup or archive storage.

Deduplication changed that forever. The random-access capabilities of disk allow data deduplication systems to remove redundant segments of data and replace them with pointers without significantly affecting restore performance. (While there's some performance degradation, restores are still much faster than when using tape.)

Despite dedupe's indisputable benefits, a lot of users waited to see if the techniques employed in target dedupe devices would eventually make their way into backup software, making such special-purpose appliances unnecessary. While most experts don't believe that target deduplication appliances are no longer necessary, data deduplication has, indeed, made its way into mainstream backup software products.

EMC and Symantec were the first major backup software companies to integrate deduplication into their product lines, and both did it through acquisition. EMC acquired Avamar Technologies, and Symantec's PureDisk product line resulted from its acquisition of Datacenter Technologies. CommVault and IBM chose to "roll their own" deduplication products.

EMC and Symantec both offer source deduplication products. That is, you can install the Avamar or PureDisk agent on a computer and the client will communicate with the backup server to identify and eliminate redundant data before it's transferred across the network. Only new bytes are sent with each backup, which makes source deduplication perfect for smaller remote offices and mobile data.

Both vendors offer their source deduplication products as standalone products, which means you don't have to purchase Symantec's NetBackup or EMC's NetWorker. So even if you weren't using Symantec or EMC backup apps, you could take advantage of their deduplication technology. But if you wanted the functionality of both the backup app and dedupe, you had to purchase and manage two products (i.e., NetBackup and PureDisk, or NetWorker and Avamar). Symantec is the first to change this with NetBackup 7, which has built-in source dedupe that doesn't require a separate PureDisk installation. While you can manage Avamar via NetWorker, and a single install of their client software supports both NetWorker and Avamar backups, Avamar still requires a separate server to back up to.

Target deduplication is also available from backup software vendors. Symantec was the first to do this by allowing NetBackup customers to send standard NetBackup backups to a media server where they would be deduplicated by PureDisk. (With NetBackup 7, this functionality is available without requiring a separate PureDisk installation.)

IBM entered the data deduplication space with the introduction of its post-process target deduplication feature in Tivoli Storage Manager (TSM) 6.1. TSM can natively deduplicate its backups stored on disk after they have completed. IBM's target deduplication offering is unique in that it's included in the base product; however, the deduplication ratios it achieves may be relatively modest compared to those of other products' options that you have to pay for.

CommVault's Simpana deduplication facility is difficult to categorize as target or source dedupe. Deduplication in backup software requires multiple steps: (1) slicing files to be backed up into segments or "chunks"; (2) creating a "hash" value (typically using SHA-1); (3) doing a hash table lookup to see if the value is unique; and (4) deciding whether or not to send the chunk to storage. Source deduplication products perform all four steps on the client; target deduplication appliances do all four at the target or backup server. With CommVault's approach, however, steps one and two are done at the client while steps three and four are done at the backup server (media agent in CommVault lingo). This is why it's difficult to classify the dedupe as source or target.

But if the real distinction between the two categories is whether or not the original, native data ever leaves the client, then CommVault Simpana is best placed in the target deduplication category. Still, Simpana's unique practice of doing the first two steps on the client allows it to do something other target products can't do: client-side compression. Most target dedupe systems won't deduplicate your data well if you compress it at the client before sending it to the target because compression inhibits the deduplication system's ability to correctly chunk and fingerprint the data to identify duplicates. But because Simpana chunks and fingerprints the data at the client, it can compress it before sending it across the network with no negative effects. The compression doesn't save as much bandwidth as source deduplication, but it can be advantageous in some environments.

Data protection management: Beyond simple backup stats

Data protection management (DPM) was introduced several years ago by Bocada Inc., the first company to attempt to produce standardized reports on multiple backup products. A number of other startup firms soon entered the fray, including Aptare Inc., Tek-Tools Software Inc. (recently acquired by SolarWinds, Inc.), TSMworks Inc., Servergraph (now part of Rocket Software Inc.) and WysDM Software (now part of EMC). The big backup software vendors saw the potential of the DPM market: Symantec picked up a product called Advanced Reporter, which became Veritas Backup Reporter and then later Symantec's OpsCenter Analytics line; and EMC turned the WysDM product into its Data Protection Advisor.

All of these products offer far more than simply telling you which backups worked and which didn't, functionality that many believe should be included in any decent backup software. However, when it comes to things like trending, capacity planning, cross-product reporting and issues that go beyond traditional backups, standalone DPM products have carved out a unique niche.

Backup apps have begun to incorporate some of these capabilities. CommVault, in particular, has been vocal about how these reporting tools should be included in the base backup product. While it could be argued that the reporting included in Simpana is better in some areas than the reporting in other companies' base products, that's not to say Simpana users couldn't benefit from a DPM product. For TSM customers, IBM's response has typically been that everything you need to know is in the TSM database so you just have to run a query. While that's true, it might be beyond the capability of many users. So while a few of the big backup vendors have incorporated some DPM features, users who need full data protection management functionality will likely turn to a third-party product.

Continuous data protection: Still kicking

Only a few years ago there were a number of companies with continuous data protection (CDP) applications, but many of them are no longer around. Some simply went out of business, while others were acquired in fire-sale deals. Did CDP simply not work? Was it a bad idea? Or was it the Star Trek of backup products (a great idea before its time)?

CDP's rise and fall was probably a combination of all of the above. When CDP works as advertised, it's easily the best way to protect your most critical applications: zero downtime for backups, and recovery time objectives (RTOs) and recovery point objectives (RPOs) of zero. What's not to like? Unfortunately, storage managers tend to be most risk averse when it comes to their mission-critical applications, so few users opted to back up those mission-critical applications using a completely different method from a vendor that they'd never heard of before.

But attitudes toward CDP changed when major companies got into the game. Symantec bought Revivio and eventually released NetBackup RealTime. IBM came out with Tivoli Continuous Data Protection for Files and bought FilesX, which became TSM FastBack. EMC purchased Kashya and delivered RecoverPoint. CommVault built its own CDP functionality around its core Common Technology Engine. With these key players in the CDP game, users can now try it in their own environments without the fear that their CDP vendor may go out of business tomorrow.

Synthetic backups: No more fulls

A long time ago, TSM developers asked a simple question: Why are we backing up data that hasn't changed? This became one of the core elements of TSM design and what TSM would eventually refer to as "progressive incremental" and others would call "incrementals forever." Once a given version of a file has been backed up, it's never backed up again.

Other backup products have chosen to use the traditional full/incremental approach to backups, also referred to as the grandfather-father-son method. But the question persisted: Why are we backing up data that hasn't changed? Eventually, CommVault, EMC and Symantec all came to the same conclusion: instead of transferring data that's already been backed up across the network, just transfer it from one tape to another within the backup server. Because 90% of any given full backup is already on tape or disk somewhere, a "synthetic full" can be created by copying the data that's needed from the latest full to a new full backup. This provides the benefit of a full backup (fast restores via collocation of the necessary data) without the downside of a full backup (unnecessary transfer of the data across the network).

All three products have implemented the concept of the synthetic full in a slightly different way (CommVault and Symantec call synthetic fulls "synthetic backups," while EMC uses the term "saveset consolidation"). However, all of them share one critical concept. Once a synthetic full is created, it's essentially just like any other full: it will be used for restores and later incremental backups will be based on that full. The previous full is only necessary if you're keeping it for longer retention.

TSM users may feel that TSM's concept of a backup set is very similar to a synthetic full, but it's actually quite different. Unlike synthetic backups, the contents of a TSM backup set aren't tracked in the backup database. In fact, one of the main purposes of a TSM backup set is to create an "instant archive" of backups that you wish to keep for a longer period of time than your TSM database has room for (see "Can backups be turned into archives?" below). Another purpose for the TSM backup set is to create a backup that can be used outside of TSM; a TSM backup set can be read without the aid of the TSM catalog. If TSM backup sets were kept in the TSM database and usable for standard restores, then they would be the same as a synthetic full.

Can backups be turned into archives?

IBM Corp.'s Tivoli Storage Manager (TSM) has a backup feature where backups are copied to what is officially called a "backup set." IBM occasionally also calls a backup set an "instant archive." This seems to go against the usual mantra that backups aren't archives, and simply holding onto backups longer doesn't magically turn them into archives. So are TSM backup sets truly archives?

To answer this question, let's take a look at a new feature in Symantec Corp.'s Backup Exec 2010. Backup Exec incorporates Symantec's market-leading Enterprise Vault engine, so users can create archives of their backups by copying them into this engine. But Backup Exec does more than just copy the data from one tape format to another; it actually creates an index of the content of the archived files or applications. This means that you can perform Google-like searches against these archives by searching for phrases that might appear in files or Exchange emails, and Backup Exec will extract that data for you.

CommVault Systems Inc.'s Simpana also has the ability to perform content searches against its backups. You can search for files or emails based on a particular word or phrase. Like Symantec, they have a more full-featured archive product as well, but you can perform archive-like searches against their backups.

Let's contrast this to what TSM is doing. A TSM backup set actually has fewer database entries than a regular TSM backup; its purpose is to "archive" older files that you no longer have room for in the TSM database. So instead of having more context than regular backups, a TSM "instant archive" actually has less. While it's now possible with some products to "turn a backup into an archive," calling a TSM backup set an "instant archive" does a disservice to the word archive.

But that's not to suggest that TSM backup sets have no value. They do allow for longer retention than what's possible in the TSM database, and they also allow for restores without having to install TSM.

Virtual server backup: Getting easier

Server virtualization has been a boon for many data centers. Far too many applications required a "dedicated server," when all they truly needed was to think they had a dedicated server. Their CPU and I/O requirements were easily met by sharing resources with the aid of a server virtualization product. But then there was backup to consider.

While most applications could be easily virtualized, backup would "not go gentle into that good night." It wanted – needed -- the full resources of both a beefy CPU and beefy storage capable of heavy throughput. It's been said that backups are a great way to test your storage and network systems because they have to move everything from point A to point B every night. Despite potential I/O issues, most users back up their virtual machines (VMs) by simply pretending they aren't virtual. They load the backup client into the virtual machine and back it up just like a standalone server. VMware Inc. introduced VMware Consolidated Backup (VCB) to help ease the pain of VM backup and to remove the I/O issues from the ESX server, but it also increased the complexity of VM backups. It required two-step backups and two-step restores for image backups, as well as the use of a separate disk-staging area. Not surprisingly, few users implemented VCB. Users of other virtual server apps, like Microsoft Corp.'s Hyper-V, also tended to back up their VMs by pretending they were physical servers.

The backup outlook is a lot brighter for both products: VMware introduced vSphere and Microsoft rolled out Hyper-V's backup architecture (which doesn't have an "official" brand name). VMware vStorage APIs for Data Protection (VADP) replaced VCB, offering everything that VCB promised and introducing the concept of block-level incremental backups. Now users can perform an image backup without having to copy the data to a staging disk first, and they can perform an incremental backup by simply having the backup application ask the vStorage APIs what blocks have changed since the last backup. The APIs promise to make things much better for those attempting to back up VMware virtual servers. The first major backup product to fully support vStorage APIs was EMC's Avamar, followed shortly by Symantec's NetBackup. As of this writing, CommVault, EMC NetWorker and IBM TSM are all working on their integration with vStorage APIs.

Microsoft Hyper-V users simply need to make sure that their backup product knows that it's talking to a Hyper-V server. Although not quite as advanced in some ways as vStorage APIs, it does a very similar job, allowing you to back up Hyper-V virtual machines without performing guest-level backup inside the virtual machine.

Hyper-V does have one advantage over VMware because it offers full Microsoft Volume Shadow Copy Service (VSS) support and VMware doesn't. Hyper-V uses VSS to quiesce its applications and notify them that the backup was successful. This allows Hyper-V users to get an application-consistent backup of any application inside a Windows VM without having to load an agent on that virtual machine. In addition, the application will know that it has been backed up and can clear its transaction logs.

VMware can quiesce applications in Windows 2003, but the actual operation it performs (VSS_COPY) doesn't notify the application that it has been backed up; therefore, you must manage the transaction logs yourself. In addition, it currently has no application support for Windows 2008. As of this writing, VMware is working on this limitation, but the company offered no comment on the timing of the roadmap. This limitation has created an opportunity for backup products to differentiate themselves and, so far, FalconStor Software, NetApp, PHD Virtual Technologies (esXpress), Symantec (BackupExec) and Veeam Software all offer workarounds to address this limitation of VMware.

BIO: W. Curtis Preston is the executive editor for and an independent backup expert.

Article 2 of 6

Dig Deeper on Data storage backup tools

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Storage

Access to all of our back issues View All