About 10 years ago a new data protection technology emerged on the market, "continuous data protection" (CDP), prompting many to assume that the "old" way of protecting data would finally fade into the sunset. As we enter 2015, the old way of doing data protection, scheduled backups, is still firmly entrenched and CDP is rarely mentioned. What happened to CDP? Have we replaced it with something better?
The value of continuous protection
Continuous data protection promised to address one specific data protection challenge: the recovery point objective (RPO). RPO is the amount of time between protection events; in other words, the amount of data that needs to be re-keyed or transaction logs that need to be replayed in the event of a failure. Both of these processes, of course, take time, and the idea behind CDP was that it could provide a very narrow RPO because data was being captured continuously.
Where CDP went wrong
The first challenge that CDP faced was the fact that simply meeting the RPO was not enough. Companies that could justify a CDP investment also needed a very rapid recovery time objective (RTO).
RTO is the time it takes to have the protected copy of data positioned and readable so that the application could access it. Traditionally, this meant copying data from the backup device and across the backup network to production storage. Moving this data into position is a very time-consuming process, especially using the technologies of a decade ago.
The second challenge facing CDP was the need to also provide a point-in-time reference to the data it was protecting. CDP lacked the ability to recover information as it appeared days, weeks, months or even years ago. In the 2004-05 era, snapshot technology was still in its infancy and only so many snapshots could be kept before the CDP platform itself needed to be backed up.
Also, that backup only knew of the CDP copy, not the original data set. So, recoveries had to be made in two stages, first to the CDP appliance, then to the application server.
Another problem was the application interface. To capture a quality copy of data, CDP applications needed to interface with applications like Oracle and MS-SQL so that online backups could be made. Many CDP products lacked these interfaces.
Even if they did have application modules, there was another problem: performance. Anytime an interaction between an application and a data protection process occurs there is potential for a performance penalty, something that application users can't tolerate. This was especially a problem for CDP because, by definition, the interaction had to be continuous.
Finally, there was a cost issue. While backup-to-disk pricing had come down significantly other technologies, like snapshots, were still in their infancy, so vendors charged premium prices for them. Technologies like deduplication and compression were not typically available on CDP appliances, so the secondary copy had to be the same size as the primary.
As previously mentioned, the concept of CDP was solid, it just needed to evolve. The first step was that primary storage needed to be able to meet more of the data protection responsibilities. Most storage systems today leverage redirected snapshot technology that allows the creation of an almost limitless number of space-efficient data copies. These copies can be replicated to local and remote storage systems for protection from failure of the primary system.
Another advantage of this approach is that the data remains in a native, usable format and more than likely on a storage system that could take on the role of production storage. This allows for very rapid RTOs to be met.
The challenge with this approach is that it was expensive because all the storage has to come from a single vendor. Also, applications had to be constantly interrupted for a protection event to occur. As a result, most customers executed a snapshot no more than once an hour. While this was a vast improvement over traditional backup, it still might not meet the tightest of RPOs.
In the last few years two other technologies have emerged that promise to address the above expense issue. First, there is software-defined storage (SDS) which allows the second on-site system and the DR storage system to be a mixture of hardware from various vendors yet still managed by the same storage software.
The challenge with SDS is that these products require a significant commitment by the storage administrator, meaning they need to switch from using their current storage software and its method of creating snapshots and replicating data, to another storage software method. The cost of retraining and re-writing scripts should be factored into this approach.
The other option is to use a copy data management product that promises to resolve the cost and, potentially, the application interface issue. These products allow the use of existing storage software tools to create snapshots and replicated data, but they add a software layer to make those snapshots more manageable and searchable. In some cases they also provide better interfaces at the application layer, allowing for more frequent snapshots without risking application performance.
In both of the above cases, there is still the issue of long-term retention. Both methods have resolved the issue of creating thousands of snapshots but not many have delivered a management capability that allows for the quick retrieval of data from within that library of snapshots. IT planners should be on the lookout for products that will make finding data within their snapshot inventories possible.
The final issue is cost. While many of these systems are very cost-competitive and can leverage data efficiency techniques, the cost of storing data forever, even as a snapshot, does consume storage capacity. We suggest creating a secondary offline copy to disk or tape that is outside of this process so that older snapshots can be released.
CDP is not dead, it has just evolved. Primary storage provides many of the capabilities of CDP, but those capabilities are integrated directly into the storage system. Copy data provides an external method of delivering CDP and promises to enhance CDP capabilities with better management and use of the protected data set.
About the author:
George Crump is president of Storage Switzerland, an IT analyst firm focused on storage and virtualization.
How CDP fits in your backup and recovery plan
Continuous data protection: Real-time DR … at a price
When you need disaster recovery, turn to CDP