Copy data management market tips, tricks and concerns

McCarony - Fotolia

Vendors of copy management systems tackle potential pitfalls

If you're looking to reap the benefits of a copy data management product, it's important to keep in mind the limits and possible risks involved with the technology.

Copy data management is a data reduction technology that is designed to reduce storage costs by maintaining a single copy of data. Although the technology is not without its flaws, vendors of copy management systems have released products that allow organizations to take a fresh look at data management.

Situations that once presented a challenge to IT pros can be approached differently using copy management systems -- for instance, if a developer needs to create a test-development environment that mimics a production data set. In the past, this would have required the creation of a separate copy of the data that could be used exclusively for development or testing purposes. With the use of copy data management, however, it becomes possible to create a virtual instance of the production environment. This means that developers are able to work with production data without putting that data at risk and without having to make a copy of the data.

As useful as copy data management may be, however, it is not without its potential pitfalls. Performance tends to be the issue that receives the most attention, but data integrity and the potential for data loss can also become issues in the absence of data redundancy.


Performance problems can occur with copy data management because, by definition, it eliminates parallel copies of data. Consider the example above, in which a virtual copy of a data set is created in order to facilitate a test-development environment. Although the dev-test environment is constructed in such a way as to create the illusion of having a dedicated data copy, it is actually working from the same data set as the production environment. This means that the underlying storage now has to support both the production workload and the dev-test workload, plus workloads associated with any additional virtual data copies that might be created later on. As such, the storage hardware will be subjected to a greater number of read and write IOPS than if the data was dedicated to a single workload.

Performance problems can occur with copy data management because, by definition, it eliminates parallel copies of data.

In addition, there is also a degree of overhead that is added by the copy data management software itself. Copy data management vendors commonly base their products on the use of snapshots, and the snapshot tree can sometimes affect performance over time. The degree to which performance is affected is based on a number of factors, such as whether snapshots are created at the hardware or software level, the number of snapshots that exist, and whether the snapshots are based on changed block tracking or on differencing disks.

Another potential disadvantage of copy data management is that doing so can theoretically increase the odds that a data loss event will occur. There are two main reasons for this. First, because copy data management builds an intricate series of data copies, the integrity of those data copies -- and the well-being of the workloads that depend on them -- is based entirely on the copy data management software working correctly. Although most of the products that are available today tend to be reputable and reliable, it is difficult to ignore the fact that the copy data management layer could potentially become a single point of failure, with the ability to hinder multiple workloads.

Copy data management vs. traditional backup

Copy data management software could also result in data loss because, in some deployments, it has replaced traditional backup.

Vendors of copy management systems all have their own approaches, but generally speaking, products take an "incremental forever" approach to data storage. In other words, each time that a storage block is modified, the newly modified storage block is stored alongside the previous version of the storage block. This is what makes it possible for multiple workloads to have read/write access to the same underlying data set without interfering with one another.

Because previous versions of storage blocks are retained, copy data management can be used as a pseudo-backup system. Depending on the capabilities provided by the vendor, storage block versioning may make it possible to restore previous versions of files, folders and volumes.

The problem is that although some copy data management products provide backup-like capabilities, copy data management is not a true backup replacement, at least not by itself. A backup is defined as a data copy that can be restored. Copy data management does not make a secondary copy of the data, and therefore cannot be considered to be a true backup product. If a storage array were to fail, there would be no way of restoring the data unless a secondary backup existed.

Vendors selling copy management systems have done a good job of creating products that can help to reduce storage costs by eliminating unnecessary data copies. Although enterprise-grade copy data management products do take measures to prevent data loss, it is imperative that administrators familiarize themselves with how their chosen copy data management product works. Only then does it truly become possible to ensure that data is adequately protected.

Next Steps

Two approaches to copy data management

Is copy data management software right for you?

Test your copy data management knowledge

Dig Deeper on Backup and recovery software