BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Backup refers to the copying of physical or virtual files or databases to a secondary location for preservation in case of equipment failure or catastrophe. The process of backing up data is pivotal to a successful disaster recovery plan (DRP).
Enterprises back up data they deem to be vulnerable in the event of buggy software, data corruption, hardware failure, malicious hacking, user error or other unforeseen events. Backups capture and synchronize a point-in-time (PIT) snapshot that is then used to return data to its previous state.
Backup and recovery testing examines an organization's practices and technologies for data security and data replication. The goal is to ensure rapid and reliable data retrieval should the need arise. The process of retrieving backed-up data files is known as file restoration.
The terms data backup and data protection are often used interchangeably, although data protection encompasses the broader goals of business continuity (BC), data security, information lifecycle management and prevention of malware and computer viruses.
What data should be backed up and how frequently?
A backup process is applied to critical databases or related line-of-business (LOB) applications. The process is governed by predefined backup policies that specify how frequently the data is backed up and how many duplicate copies (known as replicas) are required, as well as by service-level agreements (SLAs) that stipulate how quickly data must be restored.
Best practices suggest a full data backup should be scheduled to occur at least once a week, often during weekends or off-business hours. To supplement weekly full backups, enterprises typically schedule a series of differential or incremental data backup jobs that back up only the data that has changed since the last full backup took place.
Backup storage media
Enterprises typically back up key data to dedicated backup disk appliances. Backup software -- either integrated in the appliances or running on a separate server -- manages the process of copying data to the disk appliances. Backup software handles features such as data deduplication that shrinks the amount of data that must be backed up. Backup software also enforces policies that govern how often specific data is backed up, how many copies are made and where backups are stored.
Before disk became the main backup medium in the early 2000s, most organizations used magnetic tape drive libraries to store backups. Tape is still used today but mainly for archived data that does not need to be quickly restored.
In the early days of disk backup, the software continued to run on separate servers and moved data to disk instead of tape. As file sizes have increased, backup vendors have brought integrated data protection appliances to simplify the backup process. An integrated data appliance is essentially a file server outfitted with hard disk drives (HDDs) and backup software. These plug-and-play data storage devices often include automated features for monitoring disk capacity, expandable storage and preconfigured tape libraries.
Most disk-based backup appliances allow copies to be moved from spinning media to magnetic tape for long-term retention. Magnetic tape systems are still used because of increasing tape densities and the rise of the Linear Tape File System (LTFS).
Early disk backup systems were known as virtual tape libraries (VTLs) because they included disk that worked the same way as tape drives. That way, backup software applications developed to write data to tape could treat disk as a physical tape library. VTLs faded from popular use after backup software vendors optimized their products for disk instead of tape.
Solid-state drives (SSDs) are rarely used for data backup because of price and endurance concerns. Some storage vendors include SSDs as a caching or tiering tool for managing writes with disk-based arrays. Data is initially cached in flash storage and then written to disk. As vendors release SSDs with larger capacity than disk drives, flash drives may gain some use for backup.
Local backup vs. offline backup for primary storage
Modern primary storage systems have evolved to feature stronger native capabilities for data backup. These features include advanced RAID protection schemes, unlimited snapshots and tools for replicating snapshots to secondary backup or even tertiary off-site backup. Despite these advances, primary storage-based backup tends to be more expensive and lacks the indexing capabilities found in traditional backup products.
Local backups place data copies on external HDDs or magnetic tape systems, typically housed in or near an on-premises data center. The data is transmitted over a secure high-bandwidth network connection or corporate intranet.
One advantage of local backup is the ability to back up data behind a network firewall. Local backup is also much quicker and provides greater control over who can access the data.
Offline or cold backup is similar to local backup, although it is most often associated with backing up a database. An offline backup incurs downtime since the backup process occurs while the database is disconnected from its network.
Backup and cloud storage
Off-site backup transmits data copies to a remote location, which can include a company's secondary data center or leased colocation facility. Increasingly, off-site data backup equates to subscription-based cloud storage as a service, which provides low-cost, scalable capacity and eliminates a customer's need to purchase and maintain backup hardware. Despite its growing popularity, electing backup as a service (BaaS) requires users to encrypt data and take other steps to safeguard data integrity.
Cloud backup is divided into the following:
- Public cloud storage: Users ship data to a cloud services provider, which charges them a monthly subscription fee based on consumed storage. There are additional fees for ingress and egress of data. Amazon Web Services (AWS), Google Cloud and Microsoft Azure are the largest public cloud providers. Smaller managed service providers (MSPs) also host backups on their clouds or manage customer backups on the large public clouds.
- Private cloud storage: Data is backed up to different servers within a company's firewall, typically between an on-premises data center and a secondary DR site. For this reason, private cloud storage is sometimes referred to as internal cloud storage.
- Hybrid cloud storage: A company uses both local and off-site storage. Enterprises customarily use public cloud storage selectively for data archiving and long-term retention. They use private storage for local access and backup for faster access to their most critical data.
Expert George Crump discusses data backup and issues that can occur when backing up to the cloud.
Most backup vendors enable local applications to be backed up to a dedicated private cloud, effectively treating cloud-based data backup as an extension of a customer's physical data center. When the process allows applications to fail over in case of a disaster and fail back later, this is known as disaster recovery as a service (DRaaS).
Cloud-to-cloud (C2C) data backup is an alternative approach that has been gaining momentum. C2C backup protects data on software as a service (SaaS) platforms, such as Salesforce or Microsoft Office 365. This data often exists only in the cloud, but the SaaS vendors often charge large fees to restore data lost due to customer error. C2C backup works by copying SaaS data to another cloud, from where it can be restored if any data is lost.
Backup storage for PCs and mobile devices
PC users can consider both local backup from a computer's internal hard disk to an attached external hard drive or removable media, such as a thumb drive.
Another alternative for consumers is to back up data from smartphones and tablets to personal cloud storage, which is available from vendors such as Box, Carbonite, Dropbox, Google Drive, Microsoft OneDrive and others. These services are commonly used to provide a certain capacity for free, giving consumers the option to purchase additional storage as needed. Unlike enterprise cloud storage as a service, these consumer-based cloud offerings generally do not provide the level of data security businesses require.
Backup software and hardware vendors
Vendors that sell backup hardware platforms include Barracuda Networks, Cohesity, Dell EMC (Data Domain), Drobo, ExaGrid Systems, Hewlett Packard Enterprise (HPE), Hitachi Data Systems, IBM, NEC Corp., Oracle StorageTek (tape libraries), Quantum Corp., Rubrik, Spectra Logic, Unitrends and Veritas NetBackup.
Leading enterprise backup software vendors include Acronis, Arcserve, Asigra, Commvault, Datto, Druva, EMC Data Protection Suite (Avamar, Data Protection Advisor, Mozy, NetWorker and SourceOne), EMC RecoverPoint replication manager, Nakivo, Veeam Software and Veritas Technologies.
The Microsoft Windows Server operating system (OS) inherently features the Microsoft Resilient File System (ReFS) to automatically detect and repair corrupted data. While not technically data backup, Microsoft ReFS is geared to be a preventive measure for safeguarding file system data against corruption.
VMware vSphere provides a suite of backup tools for data protection, high availability (HA) and replication. The VMware vStorage API for Data Protection (VADP) enables VMware or supported third-party backup software to safely take full and incremental backups of virtual machines (VMs). VADP implements backups via hypervisor-based snapshots. As an adjunct to data backup, VMware vSphere live migration enables VMs to be moved between different platforms to minimize the impact of a DR event. VMware Virtual Volumes (VVols) also aid VM backup.
Backup types defined
Full backup captures a copy of an entire data set. Although considered to be the most reliable backup method, performing a full backup is time-consuming and requires a large number of disks or tapes. Most organizations run full backups only periodically.
Incremental backup offers an alternative to full backups by backing up only the data that has changed since the last full backup. The drawback is that a full restore takes longer if an incremental-based data backup copy is used for recovery.
Differential backup copies data changed since the last full backup. This enables a full restore to occur more quickly by requiring only the last full backup and the last differential backup. For example, if you create a full backup on Monday, the Tuesday backup would, at that point, be similar to an incremental backup. Wednesday's backup would then back up the differential that has changed since Monday's full backup. The downside is that progressive growth of differential backups tends to adversely affect your backup window. A differential backup spawns a file by combining an earlier complete copy of it with one or more incremental copies created at a later time. The assembled file is not a direct copy of any single current or previously created file, but rather synthesized from the original file and any subsequent modifications to that file.
Synthetic full backup is a variation of differential backup. In a synthetic full backup, the backup server produces an additional full copy, which is based on the original full backup and data gleaned from incremental copies.
Incremental-forever backups minimize the backup window, while providing faster recovery access to data. An incremental-forever backup captures the full data set and then supplements it with incremental backups from that point forward. Backing up only changed blocks is also known as delta differencing. Full backups of data sets are typically stored on the backup server, which automates the restoration.
Reverse-incremental backups are changes made between two instances of a mirror. Once an initial full backup is taken, each successive incremental backup applies any changes to the existing full backup. This essentially generates a novel synthetic full backup copy each time an incremental change is applied, while also providing reversion to previous full backups.
Hot backup, or dynamic backup, is applied to data that remains available to users as the update is in process. This method sidesteps user downtime and productivity loss. The risk with hot backup is that, if the data is amended while the backup is underway, the resulting backup copy may not match the final state of the data.
Techniques and technologies to complement data backup
Continuous data protection (CDP) refers to layers of associated technologies designed to enhance data protection. A CDP-based storage system backs up all enterprise data whenever a change is made. CDP tools enable multiple copies of data to be created. Many CDP systems contain a built-in engine that replicates data from a primary to a secondary backup server and/or tape-based storage. Disk-to-disk-to-tape (D2D2T) backup is a popular architecture for CDP systems.
Near-continuous CDP takes backup snapshots at set intervals, which are different from array-based vendor snapshots that are taken each time new data is written to storage.
Data reduction lessens your storage footprint. There are two primary methods: data compression and data deduplication. These methods can be used singly, but vendors often combine the approaches. Reducing the size of data has implications on backup windows and restoration times.
Disk cloning involves copying the contents of a computer's hard drive, saving it as an image file and transferring it to storage media. Disk cloning can be used for provisioning, system provisioning, system recovery and rebooting or returning a system to its original configuration.
Erasure coding, or forward error correction (FEC), evolved as a scalable alternative to traditional RAID systems. Erasure coding most often is associated with object storage. RAID stripes data writes across multiple drives, using a parity drive to ensure redundancy and resilience. The technology breaks data into fragments and encodes it with other bits of redundant data. These encoded fragments are stored across different storage media, nodes or geographic locations. The associated fragments are used to reconstruct corrupted data, using a technique known as oversampling.
Flat backup is a data protection scheme in which a direct copy of a snapshot is moved to low-cost storage without the use of traditional backup software. The original snapshot retains its native format and location; the flat backup replica gets mounted, should the original become unavailable or unusable.
Mirroring places data files on more than one computer server to ensure it remains accessible to users. In synchronous mirroring, data is written to local and remote disk simultaneously. Writes from local storage are not acknowledged until a confirmation is sent from remote storage, thus ensuring the two sites have an identical data copy. Conversely, asynchronous local writes are considered to be complete before confirmation is sent from the remote server.
Replication enables users to select the required number of replicas, or copies, of data needed to sustain or resume business operations. Data replication copies data from one location to another, providing an up-to-date copy to hasten DR.
Recovery-in-place, or instant recovery, enables users to temporarily run a production application directly from a backup VM instance, thus maintaining data availability while the primary VM is being restored. Mounting a physical or VM instance directly on a backup or media server can hasten system-level recovery to within minutes. Recovery from a mounted image does result in degraded performance, since backup servers are not sized for production workloads.
Storage snapshots capture a set of reference markers on disk for a given database, file or storage volume. Users refer to the markers, or pointers, to restore data from a selected point in time. Because it derives from an underlying source volume, an individual storage snapshot is an instance, not a full backup. As such, snapshots do not protect data against hardware failure.
Snapshots are generally grouped in three categories: changed block, clones and CDP. Snapshots first appeared as a management tool within a storage array. The advent of virtualization added hypervisor-based snapshots. Snapshots may also be implemented by backup software or even via a VM.
Copy data management and file sync and share
Tangentially related to backup is copy data management (CDM). This is software that provides insight into the multiple data copies an enterprise might create. It enables discrete groups of users to work from a common data copy. Although technically not a backup technology, CDM enables companies to efficiently manage data copies by identifying superfluous or underutilized copies, thus reducing backup storage capacity and backup windows.
File sync-and-share tools protect data on mobile devices used by employees. These tools basically copy modified user files between mobile devices. While this protects the data files, it does not enable users to roll back to a particular point in time should the device fail.
How to choose the right backup option
When deciding which type of backup to use, you need to weigh several key considerations.
Enterprises commonly mix various data backup approaches, as dictated by the primacy of the data. A backup strategy should be governed by the SLAs that apply to an application, with respect to data access and availability, recovery time objectives (RTOs) and recovery point objectives (RPOs). Choice of backups is also influenced by the versatility of a backup application, which should guarantee all data is backed up and provide replication and recovery while establishing efficient backup processes.
Creating a backup policy
Most businesses create a backup policy to govern the methods and types of data protection they deploy and to ensure critical business data is backed up consistently and regularly. The backup policy also creates a checklist that IT can monitor and follow since the department is responsible for protecting all of the organization's critical data.
A backup policy should include a schedule of backups. The policies are documented so others can follow them to back up and recover data if the main backup administrator is unavailable.
Data retention policies are also often part of a backup policy, especially for companies in regulated industries. Preset data retention rules can lead to automated deletion or migration of data to different media after it has been kept for a specific period. Data retention rules can also be set for individual users, departments and file types.
A backup policy should call for capturing an initial full data backup, along with a series of differential or incremental data backups of data in between full backups. At least two full backup copies should be maintained, with at least one located off-site.
Backup policies need to focus on recovery, often more so than the actual backup, because backed-up data is not much use if it cannot be recovered when needed. And recovery is key to DR.
Backup policies used to deal mainly with getting data to and from tape. But now, most data is backed up to disk, and public clouds are often used as backup targets. The process of moving data to and from disk, cloud and tape is different for each target so that should be reflected in the policy. Backup processes can also vary depending on application -- for instance, a database may require different treatment than a file server.