If your job entails protecting data, providing faster access to data or simply understanding data, you are probably grappling with an extraordinary storage expansion that's most likely driven by massive copy data proliferation. You are not alone.
We at Taneja Group observe that, on average, most enterprises have between 10 and 100 copies of their primary data that's been created for secondary purposes, such as data protection, testing, compliance and analytics. While this "copy data" accounts for the biggest chunk of enterprise disk storage capacity, the vast majority of companies haven't addressed the costs and risks associated with copy data silos and rapid storage sprawl.
Left unchecked, copy data and its supporting IT resources can grow at an incredible rate, as dedicated application environments and separate stacks of uncoordinated infrastructure often lead to duplicate data, unnecessary administrative overhead and unauthorized data access. Add to this the fact that most business groups (application development, quality assurance, finance and so on) have their own budgets, agendas and timetables, and there's a good chance you have rogue data copies, redundant data protection systems and overburdened database administrators and storage administrators. Fortunately, today there are several vendors that offer copy data management (CDM) products targeted at solving the enterprise copy data challenge.
These copy management systems focus on improving overall business and IT operations while reducing storage costs by enabling rapid, easy access to copy data and greater control over the entire copy data lifecycle (see sidebar: "Copy data management benefits"). Vendor case studies show, for example, how some organizations realize as much as a 90% reduction in IT administrative time for tasks associated with copy data thanks to CDM. For one enterprise, copy management systems led to a decrease in disaster recovery testing from a very disruptive full or half-day process to a one- to two-hour automated workflow that can be run multiple times per year.
This article examines the two different approaches to copy data management, outlines key features and benefits, and offers a perspective on how CDM is disrupting the traditional backup and disaster recovery market. The goal isn't to provide a detailed comparison of copy management systems, but rather to offer a framework for understanding this space. That way you'll be prepared to take concrete steps toward meeting your enterprise's specific copy data requirements.
Copy data management benefits
- Accelerate application release cycles, improve decision making, and increase efficiency and productivity with fast, easy and self-directed access to copy data in the appropriate format.
- Ensure compliance and mitigate security risks by having greater visibility into copy data usage.
- Lower storage administration costs through centralized control, automation and orchestration.
- Reduce storage costs by having the right number of copies of data with the right policies on the right storage.
Standalone vs. in-place CDM
For most enterprises, copy data management isn't an entirely new concept, but is rather an extension of already in-place data backup and recovery processes. This raises the question of whether CDM is a replacement for backup or part of a new standalone market (see "CDM, a replacement for backup?").
CDM, a replacement for data backup?
Since copy data management is a superset that includes backup functionality, the interesting question is whether enterprises will continue to maintain separate backup and CDM products. This could certainly be argued both ways, but if you believe CDM is a natural extension of backup, it seems inevitable that organizations will eventually migrate to CDM for all their secondary data needs. It just makes sense to centralize all your copy data requirements under one roof and realize the efficiency and cost benefits, as long as you don't have to compromise. This last point should be stressed. Enterprises won't completely rely on copy management systems for critical backup and recovery functions if CDM vendors force them to make trade-offs, which means the backup and DR capabilities of standalone CDM must be on par with market-leading backup and recovery applications.
Vendor approaches to CDM tend to fall into one of two camps: standalone or in-place. The category a vendor fits into is generally dictated by its primary business or origin. For example, major storage device companies have evolved their data protection offerings to incorporate CDM functionality that supports their storage portfolio, and therefore fit into the in-place category. Standalone CDM vendors, meanwhile, have a business model that focuses solely or primarily on CDM, and have their own copy technology dedicated to this purpose.
The first CDM category, standalone, provides a dedicated environment for copy data storage and management. Actifio, Cohesity and Veritas are in this category (the Veritas product is currently available through an early adopter program and is scheduled for general release later in 2016). These types of copy management systems may be available as hardware appliances only or as both a hardware appliance and a virtual appliance that is hardware-agnostic.
In either case, a key distinguishing capability of standalone copy management systems is that they ingest a copy of production data and then use built-in copy and data virtualization technology to create additional copies -- physical or virtual -- for various use cases. One potential downside of this approach is the added cost of copy technology dedicated to CDM. Conversely, advantages are the centralized management of all copy data activities and going with a vendor whose sole focus is specializing and advancing CDM.
The second category, in-place CDM, includes storage companies (e.g., EMC, Hitachi and IBM) that prioritize CDM support for their specific storage devices and independent software providers such as Catalogic and Commvault that deliver in-place support for a variety of storage devices, which means these vendors leverage the native snapshot and replication capabilities of the various storage arrays they support for CDM purposes.
The advantage of in-place systems is that IT shops don't need to purchase separate copy technology dedicated to CDM. Adding another management layer can, however, add complexity. So providing an abstraction layer that normalizes CDM functions and ensures a uniform management experience is vital.
Core CDM features
Most copy data management vendors, whether their technology is standalone or in-place, provide a comprehensive set of core CDM functionality that includes the following:
Data movement (or copy methods) is the most fundamental aspect of any CDM product. This includes snapshots, clones, replication and archival capabilities. The copy methods used are usually dictated by tolerance for data loss, recovery time requirements and how frequently the data needs to be accessed. Storage efficiency is another very important consideration here, as many companies want the ability to generate virtual copies that don't take up additional storage space. The basic concept here is to support multiple workloads by creating virtual copies that point back to a single master copy while supporting redirect-on-write, so that only changes to the original data require additional storage space.
Self-service (or IT as a Service), often found in data center and cloud management platforms, is now an important aspect of copy management systems. The need for this capability has been driven by programmers, testers and database administrators demanding more timely access to data and wanting the ability to get the data copies they need in a self-directed way, without IT administrator involvement. Self-service includes finding, provisioning and managing resources, which requires a catalog or marketplace, role-based access (or personalization), governance and integrated lifecycle controls -- all already a part of a vendor's CDM product or, at the very least, on its product roadmap.
Policy-driven orchestration is the automation of the copy data lifecycle, ideally through a drag-and-drop interface that makes creating and modifying data workflows (or templates) an easy and intuitive process. True automation requires the spin-up and down of the entire infrastructure. That means creating a policy that should involve provisioning a copy of the data and new virtual machines; setting network parameters, refresh frequency and retention period; and cleaning up copies and VMs as needed. Provisioning involves selecting a data source, a copy method (continuous, replication, archival) and a repository (or destination) that can be a physical or virtual environment and may be on premises or in the cloud.
Data optimization includes storage space reduction techniques such as compression and deduplication, as well as technologies that help companies better understand their data. Greater data visibility is essential to CDM because it's hard to manage data copies unless you know how many you have, where they live, what's in them, and the copy ownership and permissions. Technologies and methods for improving data transparency include discovery, cataloging (indexing), search and reporting (analytics). When evaluating data visibility functionality, important considerations are: Is the approach nondisruptive (allows use of native interfaces such as Oracle RMAN), does it provide global insight, and how intuitive and flexible are the analytics?
Ecosystem integration focuses on application, virtual environment, storage device, DevOps and cloud integration. An important aspect of application support is application-consistent snapshots that guarantee a usable backup by flushing any outstanding writes. This capability is essential when working with transactional systems like Oracle, SAP or SQL Server databases. Modern data centers are virtualized, so API integration with major hypervisors (Hyper-V, KVM and vSphere) is also essential. And since DevOps is an important CDM use case, enabling rapid development through plug-ins for DevOps tools such as Puppet and Chef allows users to directly manage and deploy environments that include a copy of the application data through the DevOps console using native commands. In addition, cloud support is becoming increasingly important as companies look to extend the management and use of data to the public cloud (Amazon Web Services, Google, Microsoft, Rackspace and so on), while maintaining control and orchestration through a centralized CDM platform.
The CDM shift
There is little doubt that the backup and recovery market is rapidly shifting to address enterprise copy data management needs. Initially, most data protection vendors watched copy management systems emerge with curiosity. But then frustration set in as CDM vendors gained traction by delivering backup and recovery products that did a better job of connecting business processes to data protection needs.
The honeymoon for standalone CDM appliances is over, however. Today, most major storage device and data protection software vendors offer competitive CDM functionality in their products, with plans to further beef up their offerings as the race for dominance in the overall CDM market continues to heat up.
About the author:
Steve Ricketts is a senior analyst at Taneja Group.
Will copy management replace backup?
Redundant data and copy management
Backup convergence, CDM and data protection
Learn how to make copy data storage a vital part of your storage strategy