Data protection is the process of safeguarding important information from corruption, compromise or loss.
The importance of data protection increases as the amount of data created and stored continues to grow at unprecedented rates. There is also little tolerance for downtime that can make it impossible to access important information.
Consequently, a large part of a data protection strategy is ensuring that data can be restored quickly after any corruption or loss. Protecting data from compromise and ensuring data privacy are other key components of data protection.
The term data protection is used to describe both the operational backup of data and business continuity/disaster recovery (BC/DR). Data protection strategies are evolving along two lines: data availability and data management.
Data availability ensures users have the data they need to conduct business even if the data is damaged or lost.
A key area on the data management side is data lifecycle management, which is the process of automating the movement of critical data to online and offline storage, and information lifecycle management, a comprehensive strategy for valuing, cataloging and protecting information assets from application and user errors, malware and virus attacks, machine failure, or facility outages and disruptions. More recently, data management has come to include finding ways to unlock business value from otherwise dormant copies of data for reporting, test/dev enablement, analytics and other purposes.
What is the purpose of data protection?
Storage technologies that can be used to protect data include a disk or tape backup that copies designated information to a disk-based storage array or a tape cartridge device so it can be safely stored. Mirroring can be used to create an exact replica of a website or files so they're available from more than one place. Storage snapshots can automatically generate a set of pointers to information stored on tape or disk, enabling faster data recovery, while continuous data protection (CDP) backs up all the data in an enterprise whenever a change is made.
Cloud backup is becoming more prevalent. Organizations frequently move their backup data to public clouds or clouds maintained by backup vendors. These backups can replace on-site disk and tape libraries, or they can serve as additional protected copies of data.
Backup has traditionally been the key to an effective data protection strategy. Data was periodically copied, typically each night, to a tape drive or tape library where it would sit until something went wrong with the primary data storage. That's when the backup data would be accessed and used to restore lost or damaged data.
Backups are no longer a stand-alone function. Instead, they're being combined with other data protection functions to save storage space and lower costs.
Backup and archiving, for example, have been treated as two separate functions. Backup's purpose was to restore data after a failure, while an archive provided a searchable copy of data. However, that led to redundant data sets. Today, there are products that back up, archive and index data in a single pass. This approach saves organizations time and cuts down on the amount of data in long-term storage.
The convergence of disaster recovery and backup
Another area where data protection technologies are coming together is in the merging of backup and DR capabilities. Virtualization has played a major role here, shifting the focus from copying data at a specific point in time to continuous data protection.
Historically, data backup has been about making duplicate copies of data. Disaster recovery, on the other hand, has focused on how backups are used once a disaster happens.
Snapshots and replication have made it possible to recover much faster from a disaster than in the past. When a server fails, data from a backup array is used in place of the primary storage, but only if steps are taken to prevent that backup from being modified.
Those steps involve using a snapshot of the data from the backup array to immediately create a differencing disk. The original data from the backup array is then used for read operations, and write operations are directed to the differencing disk. This approach leaves the original backup data unchanged. And while all this is happening, the failed server's storage is rebuilt and data replicated from the backup array to the failed server's newly rebuilt storage. Once the replication is complete, the contents of the differencing disk are merged onto the server's storage and users are back in business.
Data deduplication, also known as data dedupe, plays a key role in disk-based backup. Dedupe eliminates redundant copies of data to reduce the storage capacity required for backups. Deduplication can be built into backup software or can be a software-enabled feature in disk libraries.
Dedupe applications replace redundant data blocks with pointers to unique data copies. Subsequent backups only include data blocks that have changed since the previous backup. Deduplication began as a data protection technology and has moved into primary data as a valuable key feature to reduce the amount of capacity required for more expensive flash media.
CDP has come to play a key role in disaster recovery, and it enables fast restores of backup data. CDP enables organizations to roll back to the last good copy of a file or database, reducing the amount of information lost in the case of corruption or deletion of data. CDP started as a separate product category, but evolved to the point where it is now built into most replication and backup applications. CDP can also eliminate the need to keep multiple copies of data. Instead, organizations retain a single copy that's updated continuously as changes occur.
Enterprise data protection strategies
Modern data protection for primary storage involves using a built-in system that supplements or replaces backups and protects against the following potential problems:
Media failure. The goal here is to make data available even if a storage device fails. Synchronous mirroring is one approach in which data is written to a local disk and a remote site at the same time. The write is not considered complete until a confirmation is sent from the remote site, ensuring that the two sites are always identical. Mirroring requires 100% capacity overhead.
RAID protection is an alternative that requires less overhead capacity. With RAID, physical drives are combined into a logical unit that's presented as a single hard drive to the operating system. RAID enables the same data to be stored in different places on multiple disks. As a result, I/O operations overlap in a balanced way, improving performance and increasing protection.
RAID protection must calculate parity, a technique that checks whether data has been lost or written over when it's moved from one storage location to another, and that calculation consumes compute resources.
The cost of recovering from a media failure is the time it takes to return to a protected state. Mirrored systems can return to a protected state quickly. RAID systems take longer because they must recalculate all the parity. Advanced RAID controllers don't have to read an entire drive to recover data when doing a drive rebuild; they only need to rebuild the data that is on that drive. Given that most drives run at about one-third capacity, intelligent RAID can reduce recovery times significantly.
Erasure coding is an alternative to advanced RAID that's often used in scale-out storage environments. Like RAID, erasure coding uses parity-based data protection systems, writing both data and parity across a cluster of storage nodes. With erasure coding, all the nodes in the storage cluster can participate in the replacement of a failed node, so the rebuilding process doesn't get CPU-constrained and it happens faster than it might in a traditional RAID array.
Replication is another data protection alternative for scale-out storage. Data is mirrored from one node to another or to multiple nodes. Replication is simpler than erasure coding, but it consumes at least twice the capacity of the protected data.
Data corruption. When data is corrupted or accidentally deleted, snapshots can be used to set things right. Most storage systems today can track hundreds of snapshots without any significant effect on performance.
Storage systems using snapshots can work with key applications, such as Oracle and Microsoft SQL Server, to capture a clean copy of data while the snapshot is occurring. This approach enables frequent snapshots that can be stored for long periods of time.
When data becomes corrupted or is accidentally deleted, a snapshot can be mounted and the data copied back to the production volume, or the snapshot can replace the existing volume. With this method, minimal data is lost and recovery time is almost instantaneous.
Storage system failure. To protect against multiple drive failures or some other major event, data centers rely on replication technology built on top of snapshots.
With snapshot replication, only blocks of data that have changed are copied from the primary storage system to an off-site secondary storage system. Snapshot replication is also used to replicate data to on-site secondary storage that's available for recovery if the primary storage system fails.
Full-on data center failure. Protection against the loss of a data center requires a full disaster recovery plan. As with the other failure scenarios, there are multiple options. Snapshot replication, where data is replicated to a secondary site, is one option. However, the cost of running a secondary site can be prohibitive.
Cloud services are another alternative. Replication and cloud backup products and services can be used to store the most recent copies of data that is most likely to be needed in the event of a major disaster, and to instantiate application images. The result is a rapid recovery in the event of a data center loss.
Data protection trends
The latest trends in data protection policy and technology include the following:
Hyper-convergence. With the advent of hyper-convergence, vendors have started offering appliances that provide backup and recovery for physical and virtual environments that are hyper-converged, non-hyper-converged and mixed. Data protection capabilities integrated into hyper-converged infrastructure are replacing a range of devices in the data center.
Cohesity, Rubrik and other vendors offer hyper-convergence for secondary storage, providing backup, disaster recovery, archiving, copy data management and other nonprimary storage functions. These products integrate software and hardware, and they can serve as a backup target for existing backup applications in the data center. They can also use the cloud as a target and provide backup for virtual environments.
Ransomware. This type of malware, which holds data hostage for an extortion fee, is a growing problem. Traditional backup methods have been used to protect data from ransomware. However, more sophisticated ransomware is adapting to and circumventing traditional backup processes.
The latest version of the malware slowly infiltrates an organization's data over time so the organization ends up backing up the ransomware virus along with the data. This situation makes it difficult, if not impossible, to roll back to a clean version of the data.
To counter this problem, vendors are working on adapting backup and recovery products and methodologies to thwart the new ransomware capabilities.
Copy data management. CDM cuts down on the number of copies of data an organization must save, reducing the overhead required to store and manage data and simplifying data protection. CDM can speed up application release cycles, increase productivity and lower administrative costs through automation and centralized control.
The next step with CDM is to add more intelligence. Companies such as Veritas Technologies are combining CDM with their intelligent data management platforms.
Disaster recovery as a service. DRaaS use is expanding as more options are offered and prices come down. It's being used for critical business systems where an increasing amount of data is being replicated rather than just backed up.
Mobile data protection
Data protection on mobile devices has its own challenges. It can be difficult to extract data from these devices. Inconsistent connectivity makes scheduling backups difficult, if not impossible. And mobile data protection is further complicated by the need to keep personal data stored on mobile devices separate from business data.
Selective file sync and share is one approach to data protection on mobile devices. While it isn't true backup, file sync-and-share products typically use replication to sync users' files to a repository in the public cloud or on an organization's network. That location must then be backed up. File sync and share does give users access to the data they need from a mobile device, while synchronizing any changes they make to the data with the original copy. However, it doesn't protect the state of the mobile device, which is needed for quick recovery.
Data protection and privacy
Data privacy laws and regulations vary from country to country and even from state to state, and there's a constant stream of new ones. China's data privacy law went into effect June 1, 2017. The European Union's General Data Protection Regulation (GDPR) goes into effect in 2018. Compliance with any one set of rules is complicated and challenging.
Coordinating among all the disparate rules and regulations is a massive task. Being out of compliance can mean steep fines and other penalties, including having to stop doing business in the country or region covered by the law or regulation.
For a global organization, experts recommend having a data protection policy that complies with the most stringent set of rules the business faces, while, at the same time, using a security and compliance framework that covers a broad set of requirements. The basics of data protection and privacy apply across the board and include:
- safeguarding data;
- getting consent from the person whose data is being collected;
- identifying the regulations that apply to the organization in question and the data it collects; and
- ensuring employees are fully trained in the nuances of data privacy and security.
EU data protection directive
The European Union is updating its data privacy laws with a directive that goes into effect on May 25, 2018. The GDPR replaces the EU Data Protection Directive of 1995 and focuses on making businesses more transparent. It also expands privacy rights with respect to personal data.
The GDPR covers all EU citizens' data regardless of where the organization collecting the data is located. It also applies to all people whose data is stored within the European Union, whether or not they are EU citizens.
GDPR requirements include:
- Barring businesses from storing or using an individual's personally identifiable information without that person's express consent.
- Requiring companies to notify all affected people and the supervising authority within 72 hours of a data breach.
- For businesses that process or monitor data on a large scale, having a data protection officer who's responsible for data governance and ensuring the company complies with GDPR.
Fines for not complying can be as much as €20 million or 4% of the previous fiscal year's worldwide turnover, depending on which is larger.
Formulating a strong data protection strategy is an important part of every enterprise's security plan. DLP products can help. Learn how.
In this Buyers Guide series, gain a better understanding of deploying data loss prevention products, use cases scenerios for DLP products, get criteria for choosing the right DLP product and comparing the top DLP products, and learn how to create an enterprise data classification policy.
Learn about the "The 5 Cs of data protection"