Sometimes, a failed backup isn't detected until an attempt to recover data fails. New data protection management tools can provide early warnings about data protection gaps.
Backup is a fundamental responsibility of every data center or storage manager. But as the storage environment being protected becomes more diverse, it's difficult to ensure that the data on every system has been backed up successfully. Given the critical nature of most systems, managers need to know quickly if there's a problem in the data protection process. To meet this need for timely and accurate backup status information, a new class of software offerings -- known as data protection management (DPM) -- is emerging that promises to not only sound an alert the moment a backup fails but provide guidance as to how to fix the problem.
While backup operations may be relegated to a system administrator, it's still important for CIOs and other higher-level data managers to know what's going on with their data protection process. Ultimately, if data can't be recovered, those managers are the ones who will be expected to provide an explanation for the failure.
The challenge today is that expectations are at a very high level. Executives and business-line managers assume IT infrastructures will be 100% available, making uptime of five nines (99.999%) a thing of the past. There's also the expectation that data will be recovered instantly from almost any point in time with limited details on what that data is.
The ROI of data protection management
- Service-level agreement (SLA)-focused data protection will report on the health of the business application not the data protection tool.
- Reduce backup resource consumption to fine tune and even eliminate backup jobs to save backup storage capacity.
- First things first. The primary need is to identify the backup failures putting SLA attainment at risk.
Because of these expectations, a CIO needs up-to-date, almost real-time information about their shop's data protection process as a whole. Consequently, the first key deliverable of a data protection application is to provide CIO-level visibility into the backup process. Typically, this means an overview of the success or failure of each system's backup.
The problem is that in most enterprises the backup process will generate a failure of some sort almost every night. But many of these failures don't mean data is at immediate risk; rather, it could indicate the process may be heading that way if left uncorrected. The CIO needs to know specifically when the data risk is serious enough that IT may not be able to meet its data availability commitment to users. In other words, data protection management tools should provide CIO-level reporting functions that can be aligned with the service-level agreements (SLAs) IT has established with the company's lines of business.
It's all about service levels
What qualifies as a successfully completed backup varies among applications and data sets. For some applications, success may simply be a second copy on another disk backup device; for others, it may mean data is copied to tape media. And for many others, a backup job may not be successfully completed until the data is copied to a remote location outside the data center.
These varying levels of what qualifies as success is what make up the various service levels in the environment. It's important that a data center not establish a one-size-fits-all approach to backup SLAs. They should differ according to the needs of the applications and business groups that own them; a backup management tool allows management of the process specific to those SLAs. Some data centers may treat all data the same and manage their backup process based on a single SLA because they don't have the ability to manage to a specific SLA.
Another service-level criterion may specify the time between successful backup events. For example, for some data sets merely getting a good secondary copy every few days may be acceptable. Other applications may need this secondary copy once a night, and a few applications may need a smaller window between successful data protection events.
Those are just two common examples of service levels, either implied or formally defined, that a data center operations team has to adhere to. Data protection management offerings have evolved beyond just reporting if individual jobs have passed or failed, and can now alert managers to which service levels are in jeopardy of not being met.
Multi-platform data protection management reporting tools
Aptare Inc.'s StorageConsole Backup Manager. This backup management tool features media forecasting, job summary across backup applications, backup detail reporting, device performance reporting, billing and chargeback.
Bocada vpConnect. Bocada Inc.'s data protection management (DPM) tool consolidates virtual and physical reporting and includes support for new virtual-only backup apps, virtual machine data protection gaps, snapshot capacity utilization, trouble ticket integration, executive roll-up reports and service-level agreement-specific adherence reporting.
EMC Data Protection Advisor. EMC Corp.'s DPM app provides reporting across data protection functions, including replication. It also offers cross-platform monitoring, alerts, optimization, and capacity planning and reporting.
That's an important feature because in an enterprise backup there can be dozens of failures recorded each night with only a limited time to fix them prior to the start of the production day. A backup administrator must be able to prioritize which failures are important enough to address first.
More than just a single application
The overwhelming majority of enterprises run more than just one backup application. A common practice is to have one backup application that's used to protect the virtual server environment and a legacy application that protects the physical environment. Additionally, there might be a mix of backup software as a result of a merger or acquisition, especially when there were different standards for data protection in the merging companies.
Companies often also make data protection decisions along operating system lines. For example, they may choose one application to back up Windows servers, another for servers running Linux and still another for legacy Unix operating systems. Finally, they may make these decisions based on the needs of the production application itself. For example, some backup apps have unique features when it comes to protecting Oracle, Exchange, SQL Server or SharePoint applications.
While the legacy data protection applications eventually narrow the gaps in coverage or capabilities, the reality is that once an application makes it into the environment it can be very difficult to replace or remove, even if its replacement is another product that's already used by the organization.
Most data centers use a variety of applications and storage system features, like snapshots, to meet service-level commitments to different lines of business. Those business groups don't necessarily want to know how their data protection SLA is being met on an application-by-application basis. Instead, they need to know whether the data protection process as a whole is meeting their SLA. For that reason, it's important that the data protection management application combine its monitoring of multiple applications into a single report that signifies SLA attainment.
More than high-level summaries
While the summary is important for CIOs and other management-level IT staff, most organizations want their data protection management tool to empower the backup team to do their jobs better and more efficiently. The backup team works in the data protection trenches, and needs a more granular level of reporting that will enable them to optimize the backup process so it can be completed faster and more consistently while using fewer resources. A data protection monitoring tool delivers just that kind of information.
More than just error codes
A data protection management tool that provides the kind of granular detail a backup administrator needs must go beyond simply reporting on what went wrong. Many of these offerings are now leveraging a knowledge base that maps the error code to a particular error along with the specific process the backup environment was involved with prior to the error occurring. These tools can then apply that combined knowledge to provide admins or managers with a natural language explanation of what the error actually means. Some tools can even take that information and generate potential solutions to the problem or recommend changes to the environment to prevent its recurrence.
If a data protection management product has the ability to "humanize" these error codes and provide a root-cause analysis, it will pay for itself very quickly. That kind of information could save backup administrators upwards of a few hours every day chasing down error codes, and allow them to figure out how to resolve the problem more quickly.
More than just troubleshooting
While fixing problems is initially the primary function of a data protection management tool, helping to optimize backup storage resource utilization and performance is another very valuable benefit. The error reporting and troubleshooting assistance described above should relieve backup administrators of tedious error-checking chores and leave them with more time to focus on optimization.
Data protection management tools have the ability to report on backup space utilization. They can, for example, report on a per-application basis how much capacity is being consumed by the backup. More importantly, they can do this reporting by job, so the backup team is able to see which backup jobs use the most resources. At the same time they can indicate the retention setting of those jobs. Minor adjustments in the number of full backups retained can make a dramatic difference in the amount of backup capacity consumed.
Another data protection management capability is reporting on the capacity consumed per business-line app across backup systems. For example, DPM tools can show all the protection being applied to the same Oracle database. It's not uncommon to see the exact same data set protected multiple times across multiple apps. Using a data protection management solution to eliminate redundant jobs can dramatically reduce network bandwidth requirements and backup storage space requirements.
A DPM app can also lead to better use of primary storage capacity. A few DPM apps can even map the storage of hypervisor-based snapshots into the overall data protection snapshot quantities. This could lead to a reduction in the number of snapshots taken or retained, and free valuable primary storage capacity.
Bottom line on backups
Data protection monitoring and management is among those few applications that are as valuable to a CIO as they are to a system administrator. The ability to provide high-level data backup success reporting plus resource utilization gives a CIO confidence in their backup infrastructure because they know it can fulfill its mission when called upon. It also provides backup administrators with the tools they need to eliminate backup failures and improve backup performance. Data protection management should be considered a "must have" for enterprises in 2014.
About the author:
George Crump is president of Storage Switzerland, an IT analyst firm focused on storage and virtualization.