When backing up large files, administrators once struggled with media capacity limitations and the inability of backup applications to support large file sizes. Today, such limitations are far less of an issue, but backup admins encounter other challenges associated with backing up large files.
Although media capacity and backup application compatibility issues have mostly disappeared, they can present issues for one-off backups, especially to some types of removable media.
For example, an administrator recently attempted to copy a 2 GB file to DVD for safekeeping prior to making a modification. In doing so, he discovered that the software that he was using did not support backing up files that were over 2 GB in size.
Fortunately, such large file backup issues primarily tend to impact one-off situations because most organizations do not rely on DVDs as their primary backup media. Enterprises generally write data to disk- or tape-based backups, which have long supported multi-gigabyte files.
Although multi-gigabyte files have become the norm, there are still issues associated with protecting them, and they primarily center on backup and recovery performance.
There isn't one single approach to protecting large files that is appropriate for every organization. Organizations must consider their own unique business needs and then develop a solution that is designed to meet those needs. Administrators can begin assessing their needs by answering a few questions including:
- Are there a lot of large files or just a few that need to be protected?
- Are the large files frequently modified, or are they relatively static?
- Are large files being created on a regular basis?
- What is the recovery time objective for large files?
- Are backups being created locally, or are they being written to the cloud?
Your answers to these questions will help you to formulate an approach to the long-term protection of such files.
For instance, if you find that the organization is frequently creating or modifying large files, then it might be appropriate to deploy a high-performance, high-capacity backup target that is used specifically for protecting oversized data.
On the other hand, if you find that the creation of large files is somewhat rare and that the large files that you do have are static, then writing copies of those files to an offline archive might be more appropriate.
A backup target's location affects its performance. If backups are being written to a local backup target, then backing up large files will probably have a negligible impact on the backup process. Most modern backup solutions offer sufficient bandwidth that backups of multi-gigabyte files occur very quickly. However, if the backups are written to the cloud or to a remote data center, then backup performance can be an issue.
Internet bandwidth is finite, and depending on the amount of bandwidth that has been allocated to the backup process, backing up large files can be time consuming and may cause a delay in other data being backed up.
One solution to this problem is to use backup seeding. That way, you can avoid copying a huge file across the Internet. Once the remote backup has been seeded, block-level changes can be replicated to the backup target so that the backup is kept in sync without having to replicate the entire file every time a modification to the file is made.
Of course, in some organizations, certain types of files are rarely if ever modified. Take high-resolution video for example. Video edits might be made, but edits are almost always written to a separate file so as to avoid overwriting original source material. If your organization adheres to such a practice, then there might not be much benefit to seeding since seeding assumes that the data will eventually be modified. A better approach might be to write static data to tape or other removable media for long-term archiving.
If something were to happen to a large file, it is important to be able to get it back quickly.
At first, the obvious solution would be to take advantage of deduplication to reduce the number of storage blocks that need to be transferred during the recovery operation. Although this approach works in some situations, it is not always effective. Some types of data simply deduplicate better than others. Some of the largest files tend to be video files. Many video formats use native compression, which causes poor video file deduplication.
For more on this topic
Take a closer look at the backup issues caused by big files
Best practices for big data backup
That being the case, the most effective thing you can do to ensure the fast recovery of large files is to make sure you have an on-premises backup replica or a copy of the data saved on tape.
Although it is always a good idea to replicate your backups to a remote data center, a copy of the backups should ideally be kept in-house. Not only are restores from a local device faster than a restore from the cloud, but having a local backup copy allows you to perform recoveries even if the remote backup is inaccessible.
If, for some reason, you are forced to restore a large file from the cloud, then you may be able to temporarily adjust your bandwidth throttling policy to facilitate a faster restoration. Most cloud backup products allow administrators to use QoS or a similar mechanism to limit Internet bandwidth consumption. That way, other Internet traffic does not get choked out by the backup process. In the event of a large, high-priority restore, however, it may be in your best interest to temporarily loosen that policy so the restoration is completed more quickly.
The job of large file backup and protecting them has always been something of a challenge for administrators. Although capacity was once the biggest limiting factor, most of the challenges associated with backing up or restoring large files today center around performance.