Olivier Le Moal - Fotolia
Backups are a necessary, if tedious, component of storing data. We back up data regularly -- daily or even multiple times per day -- knowing we can restore it when the inevitable happens. But how often do we restore? When it comes to data backup and restoration, what is the ratio of backup volume to restore volume?
Because data loss is a common concern, it's likely that organizations transfer hundreds of times more data to backup than they end up restoring. Surely there must be some way to get more business value from this effort. Alternatively, is there a way to reduce the effort to achieve the current business value?
Layers of backups
The usual approach to data backup and restoration is to back up an entire computer. However, this may not be the most efficient way to protect applications and their data.
Many applications in our data centers have their own recovery strategies. If we use these built-in features, then we may not need so many backups.
Microsoft Windows file shares have a previous version functionality that enables user self-service restores for deleted files. Database applications use logs to enable point-in-time recovery from a less recent backup. If we are aware of these layers of protection, we can adjust our backups to be more efficient and, potentially, less frequent.
How often and how much do you back up?
The more frequently you back up, the more granular your restore; however, you also transfer more data and require more space for these additional backups.
Most modern data backup applications do not create full backups every time; they make one full data copy and follow that with incremental backups to minimize the data transferred. The efficiency of moving from full copies to incremental has enabled more frequent backups, possibly without regard for the value of these backups. If we understand the business value of the application, we may be able to reduce the frequency of the backups based on the business risk of data loss.
Backup vs. archive
One thing to consider is the difference between backup and archive. Backups are about returning data to a recent point in the past, and the restored data from a backup still has current business value. The need for backups is driven by the business risk of losing data. Data backup and restoration is a relatively frequent activity that needs to happen fast, as business operations are delayed until the restore is complete.
Archives are used to see the state of the business at some distant point in time. The restored data in an archive is no longer directly relevant to the business. The need for archives is driven by regulatory compliance. Restore from the archive is far less common, and can have longer lead times, as immediate business operation is not dependent on the restore.
The result of these different requirements is that backups are stored on disks and archives are most often stored on tape or on cloud-based object storage. Data in archives is trapped, while backup data is more available to deliver immediate business value.
Scanning your backups
It is also worth identifying which storage characteristics are beneficial for data backup and restoration. Backups are generally sequential and write-intensive. Restores are sequential and read-intensive.
Backup storage is usually optimized for storing a lot of data and for sequential access. Production primary storage is usually smaller and more optimized for random access. If we use backup storage for periodic scanning and sequential access tasks, then we can offload it from the primary storage. The result is better performance of the primary storage.
One example of scanning is locating personally identifiable information that is stored with specific compliance requirements, and checking that we aren't storing credit card numbers on systems that are not compliant with the Payment Card Industry Data Security Standard.
Remediation still needs to happen on the primary storage, but we can offload the scanning to secondary backup storage.
DevOps from backups
Over the last few years, a new generation of data backup and restoration products has appeared that uses solid-state, as well as hard disk drives. This hybrid backup storage offers excellent performance for random access to the data in solid-state.
The result is that these backup stores can be used for test and development activities. It may be as simple as standing up a copy of the production environment to test a net software version before deploying it to production. It may be integrated into a continuous integration and deployment pipeline so that new software versions developed in-house are tested with an exact and up-to-date copy of production before deployment. A full copy of production is a great place to do functional testing in a DevOps environment.
Is a cloud backup strategy right for you?
Flash-based backups become a serious contender
Emerging data backup technologies