Managing and protecting all enterprise data


Second-generation CDP

Continuous data protection (CDP) received lots of attention but garnered few takers as a standalone product. Since then, the technology has been incorporated into data protection products and its role is now likely to expand. In addition, rapid data growth and shrinking backup windows are two of the trends that support the increased adoption of CDP.

Continuous data protection products still offer the most granular recovery points of all data protection applications.

WHEN CONTINUOUS DATA PROTECTION (CDP) emerged a few years ago, it was positioned as a product that would replace traditional backup software. CDP vendors predicted IT managers would abandon their age-old weekly full and daily incremental backups and adopt CDP, which captures every data change and can restore data and applications to any granular point in the past. But that prophesy never materialized, as the majority of CDP pioneers--vendors like Kashya, Lasso Logic Inc., Mendocino Software, Revivio and TimeSpring--were either acquired or went out of business. In addition, most IT managers continue to rely on traditional data protection vendors for backup and recovery. Does that mean CDP has failed? Or is it succeeding, but in ways most observers wouldn't have expected?

CDP has failed as a standalone product because most IT managers weren't willing to forgo their proven backup software and backup methods for a new and unproven technology. "In hindsight, it's clear that offering CDP as a standalone product was the wrong approach to get into the marketplace," says Greg Schulz, founder and senior analyst at StorageIO Group, a technology analyst and consulting firm in Stillwater, MN.

The lack of integration with existing backup software was one of the reasons CDP wasn't able to survive on its own. Those who bought into CDP had to run two completely separate backup infrastructures or take a leap of faith and commit all of their data protection to CDP, a step only a few were willing to take.

Equally damaging to CDP vendors was the success of array-based data protection for critical data. Array-based snapshots and replication are very similar to CDP. Instead of capturing every change, snapshots taken in defined intervals were sufficient for most data protection needs. As the frequency of snapshots increases, the distinction between snapshots and CDP blurs. Moreover, the idea that only some of the many recovery points of CDP are application-consistent confused users, and made the concept of capturing each and every change somewhat questionable. "While all snapshots are consistent and usable for recovery, determining a good point of consistency can be difficult in CDP products," explains David Russell, VP of research at Gartner Inc., Stamford, CT.

Advanced array features and data deduplication further reduce the gap between CDP, snapshots and replication. NetApp, for instance, feels that CDP and array-based data protection compete on an equal footing. Although NetApp acquired CDP vendor Topio, it has no plans to offer a pure CDP product at this point.

"Array-based snapshots and replication are proven technologies, and features like application integration and deduplication put it in direct competition with CDP," says Chris Cummings, NetApp's senior director, data protection solutions.

But array-based data protection has some significant disadvantages vs. CDP: It's typically more expensive, more complex and only works between supported arrays (see "CDP vs. array-based data protection," below). "Software-based CDP has enabled small- and medium-sized businesses to deploy data protection solutions that could previously only be afforded by larger firms," says Lauren Whitehouse, an analyst at Enterprise Strategy Group (ESG) in Milford, MA.

Click here for a comparison of
CDP vs. array-based data protection (PDF).

Even though CDP failed as a standalone app, it has re-emerged and is succeeding as a feature, option or enabling technology in backup and disaster recovery (DR) products. Backup vendors BakBone Software Inc., CA, CommVault, EMC Corp. and Symantec Corp. all offer CDP in their products. While BakBone, CA, EMC and Symantec leverage the work of CDP pioneers through acquisitions or OEM relationships, CommVault developed its own CDP technology. So the lack of integration of early standalone products has become a non-issue. More significantly, it lets users dabble with CDP and use it to protect some apps and data, but use traditional backup methods for the rest of their data protection needs.

"Many of our customers start with CDP for a particular application, but as their comfort level and trust with CDP grows, they start using it for other apps and data," reports Zahid Ilkal, product manager at CommVault. A case in point is Matt Frehner, information systems manager at Atlanta-based Newell Rubbermaid, who has been using CommVault's Continuous Data Replicator to centralize remote-office backup and is now ready to take the next step. "We're planning to extend the use of CDP to disaster recovery of our Exchange and Microsoft SQL servers," says Frehner.

How CDP is used today
While the CDP acronym is familiar, it's interpreted in different ways. "There has been a lot of confusion about CDP and what CDP really is," says Laura Dubois, program director, storage software at IDC, Framingham, MA. "CDP is like video and snapshots are like photos," explains Whitehouse. As a result, CDP minimizes the loss of data in case of a failure. With snapshots, all data between a failure and the latest snapshot is lost.

With rapid data growth, meeting backup windows is a challenge for IT managers, a problem CDP can address. CDP eliminates the need for a designated backup window as changes are captured on an ongoing basis. Similarly, the need for more aggressive recovery time objectives (RTOs) and recovery point objectives (RPOs) can be met by CDP (see "Data protection metrics comparison," below). "There's a clear need in the data protection market for more compressed recovery points and times, which CDP helps to achieve," says Dubois.

Click here for a comparison of
data protection metrics (PDF).

Data protection for branch offices has been a pain point for most companies. The lack of IT staff in remote offices, the need to deploy tape libraries and using non-IT staff for tasks like tape changes has been challenging, resulting in inadequate data protection. Unlike expensive array-based replication, CDP has addressed this crucial issue relatively inexpensively with the installation of CDP agents on remote servers that replicate data changes back to a central data center. Because changes are replicated as they occur, bandwidth requirements are modest. "We're replicating data in remote offices back to our data center where it's backed up to tape," says Newell Rubbermaid's Frehner. "If a remote location loses a server, we're able to point users to the replicated data in the primary data center by simply changing their login scripts."

CDP has been very successful in the disaster recovery market. In a typical DR setup, production data and applications are mirrored to standby servers in a DR location to which users can be failed over quickly. Unlike array-based snapshots and replication, which have been primarily used for this in the past, CDP is less expensive and less complex to set up. As a result, CDP enables smaller companies to put in place a DR strategy that only larger firms could afford in the past.

"We chose CA XOsoft High Availability [formerly CA XOsoft WANSync HA] as our DR solution because it was relatively inexpensive and has built-in provision for DR testing," says Peter Haas, director of technology at the Supreme Court of Louisiana in New Orleans. Asempra Technologies Inc., CA XOsoft, Double-Take Software Inc. and InMage Systems Inc. have opted for CDP as the underlying technology to power their DR-centric data protection products. "A combination of local backups and DR with the ability for failover is the best way of ensuring 24/7 availability, and CDP is the most appropriate technology to achieve it," explains Rajeev Atluri, CTO and senior VP of engineering at InMage.

Some customers have been using CDP for operational reasons. "Although the primary reason for deploying InMage DR-Scout was disaster recovery, it's a great tool for cloning production instances for testing of patches and other changes prior to production migration," says Matt Reynolds, CIO at the San Francisco law firm of Howard Rice Nemerovski Canady Falk & Rabkin.

Besides continuous protection of files, CDP can protect a few critical apps. Almost all CDP products protect Microsoft Exchange Server and Microsoft SQL Server, but the level of recoverability varies. While some products can only recover complete Exchange storage groups and databases, others can recover single mailboxes and even single mail objects.

"Besides being able to reduce the dependency on tapes for backup and recovery, the ability to provide low-cost local disaster recovery of our Exchange servers and the ability to restore single mailboxes were the main reasons for deploying Asempra Business Continuity Server," says Derek Kruger, IT and communications supervisor for the city of Safford, AZ.

Application support beyond Exchange and SQL Server is sparser and varies by CDP vendor. Oracle databases, IBM DB2, MySQL, Active Directory and Windows SharePoint Services are among the apps supported by some vendors. The list of supported apps proves that Windows is currently the widest supported platform for CDP. While all CDP vendors support Windows, support for Unix derivatives is mostly present in higher end CDP solutions like EMC RecoverPoint, InMage DR-Scout and Symantec Veritas NetBackup RealTime Protection.

Despite implementation differences (see "CDP product sampler," below), CDP products are architected in a similar fashion and consist of two primary components: A mechanism to capture data changes of protected systems, and a CDP repository where these changes are stored.

Click here for a sample of
CDP products (PDF).

Recording changes
With the exception of EMC RecoverPoint, which analyzes I/Os within a Fibre Channel (FC) fabric without the need to install agents on protected systems, host-side agents are by far the most prevalent method of detecting and capturing data changes. Host-side agents implement so-called filter drivers at the file-system or volume level, which are invoked by the OS whenever changes are saved. Depending on the implementation, data changes are then replicated from the protected host to a CDP repository in real-time or at defined intervals.

CDP implementations are split almost evenly between file-level and volume filter drivers, and each approach has its pros and cons. Volume filter drivers have the advantage of being file-system agnostic, which simplifies the support of multiple platforms. The majority of vendors with significant platform support beyond Windows, such as EMC RecoverPoint (which supports fabric and volume filter driver), InMage DR-Scout and Symantec Veritas NetBackup RealTime Protection, all opted for volume filter drivers. Moreover, vendors with volume filter driver implementations tout it as less complex. "All we have to know is volume information, whereas products sitting on top of the file system ... need to track many more file attributes," says InMage Systems' Atluri.

On the other hand, the ability to capture whatever file attributes are required enables vendors like Asempra to implement unique features that are impossible to match by volume filter driver implementations. "The fact that we know everything about a file set enables us to virtualize data sets, which we can present to applications like Exchange Server just seconds after a failover," explains Gary Gysin, president and CEO at Asempra. In other words, Asempra Business Continuity Server can present all relevant file-system meta data to users and applications right away while the actual data is restored in the background, greatly reducing the time between a failover and when files and applications can be used.

The CDP repository
The second critical CDP key component is the CDP repository, which typically stores two types of data: A replica of the protected data and a log of all changes for a defined period of time. Whenever a change is sent to the CDP repository, it's applied to the replica, synchronizing it with the protected source and making the replica usable as a production image in the case of failover. This is very similar to data protection via array-based replication. But in contrast to replication-based data protection, all replicated changes are also stored in a change log or change journal, which tracks every change.

In case a file or application needs to be restored to a previous point in time, changes are reversed by traversing the change journal. Because the CDP repository stores a synchronized copy of the protected data as well as a list of changes, its size must be the size of the protected data plus the space needed for the change log. The size of the change journal depends on the number of days for which any-point-in-time recovery is required, as well as the number of changes. For frequently changing data, the change journal grows more rapidly. For instance, if 20% of the size of the replica is reserved for the change log, a database environment with 20% changes a day would allow a rollback of one day. "We recommend 24 to 72 hours CDP recovery and defer to other disk-based or tape-based restore methods if data beyond this period needs to be recovered," says Rick Walsworth, EMC's director of product marketing.

A critical aspect of any CDP evaluation is the mechanism a CDP product has in place to fail back to the production system after a failover. The failback has to be simple and provisions need to be put in place to prevent data and transaction loss. The methods vendors put in place vary in implementation, ease of use and capabilities. For instance, InMage Systems puts an agent on protected and failover servers. "By having an agent on the failover server, the failover server simply starts replicating to the CDP repository when a failover occurs," says the firm's Atluri.

Application consistency
Unlike files, which can be restored to any of the available recovery points, applications like Exchange and SQL Server require recovery points at which an application was consistent. In case a CDP product detects application-consistent events, it inserts so-called markers into the recovery journal, which are then used during the recovery process to determine and present application-consistent recovery points.

Vendors have taken two approaches to ensure application consistency. The first approach looks at the data stream to detect events at which an app is consistent. "We track events like file-save operations, or Exchange and SQL Server start or shutdown events to determine consistent states and mark these events within the change journal," explains Bob Roudebush, Double-Take Software's director of solutions engineering. Heuristic consistency appears to be favored by products with file-level filter drivers, mostly because the file system provides the information required to determine consistent application states.

Fabric-based and volume filter driver-based products favor proactive consistency in which the CDP product proactively puts the application into a consistent state through application APIs. "We call local backup APIs, such as Volume Shadow Copy [VSS] in the case of Exchange Server, and insert a bookmark in the recovery journal," explains Atluri. Contrary to proactive consistency, the heuristic consistency approach has room for errors.

Vendor assessment
Although many of the original CDP vendors have vanished, their technologies live on in other data protection products. CDP isn't a free option or feature. While the majority of backup vendors charge by the number of protected machines, a combination of a base price and capacity-based pricing seems to prevail for fabric-based and volume filter driver-based products.

EMC and Symantec added CDP to their data protection suites through acquisitions. EMC acquired Kashya to complement its Invista fabric-based storage virtualization product. Symantec has spent the last few years integrating its Revivio acquisition into NetBackup and recently rolled it out as Veritas NetBackup RealTime Protection. Both products compete at the high end of the CDP market, capturing changes at the FC I/O level, and offer broad application support.

IBM Corp. recently extended its Tivoli Storage Manager (TSM) suite with TSM FastBack through technology it acquired from FilesX. Remote-office data protection of Window servers and applications is FastBack's primary target market.

CA entered the CDP market through its XOsoft acquisition, and offers the XOsoft product as CA XOsoft High Availability. Unlike some of the other backup software vendors, CA hasn't yet tightly integrated XOsoft with its backup app and continues to target the SMB DR market where XOsoft has done well.

Asempra and InMage Systems are among the few CDP firms that have survived on their own. Asempra Business Continuity Server and InMage DR-Scout are available from resellers and through OEM relationships. While BakBone Software and Hitachi Data Systems use the Asempra product, Pillar Data Systems and Xiotech Corp. partner with InMage Systems.

Double-Take Software and SonicWall Inc. got into CDP via acquisitions. Double-Take added CDP pioneer TimeSpring to complement its continuous replication software; SonicWall bought Lasso Logic to offer data protection appliances for SMBs with limited IT resources.

Despite its ominous start, CDP has been incorporated into data protection products and its role is likely to expand. Rapid data growth, highly distributed firms with branch offices lacking IT resources, the need for better RTOs and RPOs, shrinking backup windows and the need for 24/7 app availability are all trends that support the increased adoption of CDP.

Article 6 of 18

Dig Deeper on Data storage backup tools

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Storage

Access to all of our back issues View All