Can you explain difference between backup vs. archive?
Different processes with different objectives. By the most generic definition, an archive is just a stored set of organized data. The goal of an archive is to store this organized intelligible dataset for a long period of time and in a form that enables granular data retrieval. The improved resiliency of tape is ideal for this application, by the way. Accelerated resiliency tests performed and documented by FujiFilm on its next gen BaFe tape has demonstrated rock-solid data retention for more than 30 years.
Disk, by contrast, has a failure rate considerably higher than what disk makers say, as documented by studies performed by both Carnegie Mellon University and by Google a couple of years ago.
Backups are copies of data designed for short-term storage, as well as frequent replacement or update. The idea is to have a mostly current copy of data on inexpensive offline media. That way, the data is provided an “air gap” that insulates it from data corruption events that impact disk-based data, maximum “portability” so the tape copy can be moved off site and out of harm’s way, and “target agnosticism” so that data can be restored to ANY disk array if the original data or storage platform is lost. With most disk-to-disk replication solutions, replication can only be made between two identical rigs. Moreover, mirrors usually need to be “broken” -- the replication process stopped -- so you can check the deltas or differences between the primary data repository and the mirror repository. This is almost never done because of the hassle involved, which has resulted in a lot of faulty disk-to-disk replication processes that go undetected until you need your data back.
That, it would seem to me, is the wrong time to discover that data has been moved around on your array and you have been mirroring blank space in your efforts to protect your data.
This was first published in March 2012