Complete guide to backup deduplication
A comprehensive collection of articles, videos and more, hand-picked by our editors
Remote site backup can be a significant issue for organizations with small branch offices that need data protection, but don’t necessarily have trained IT staff on site. Centralized backup is one way to address these challenges. Rachel Dines, an analyst with Forrester Research in Cambridge, Mass., discussed some of the options for protecting remote site data with SearchDataBackup.com Assistant Editor John Hilliard. Listen to our podcast, or read the transcript of the interview below.
Download for later:
- Internet Explorer: Right Click > Save Target As
- Firefox: Right Click > Save Link As
What are the challenges associated with centralized backup of remote site data?
There are several challenges with this. First of all, you have to figure out how you’re going to transport the data. And traditionally, a lot of companies have been shipping tape. I’m not going to bash tape too much, but there’s a lot of issues with shipping tape. One is you have to worry about those tapes getting lost. But in addition, you have to have tape libraries and tape drives at your remote site, and have to have someone who knows how to load them up and get the backups done before you ship the tape. A lot of remote sites don’t have a lot of folks with good IT skills… trying to get some solution that is as hands-off as possible is really important. But I would actually argue that backing up remote sites is easy when compared to trying to restore data to remote sites.
The backups can be done… but restores are the tricky part. What’s the point of backups if they’re not restores, right?
What are the challenges associated with restoring data from a centralized backup to a remote sites?
There’s definitely a way to work around the restores. The thing is, a lot of the advanced ways of doing the remote office backups, you’re not keeping a local copy of the backup. When you try to do the restore, you have to push a lot of data over what is usually not a very thick pipe, and restores can take a long time.
So for data sets for offices where you are not keeping the backups locally, I see people doing things like burning a DVD and shipping the data that way, or putting it on removable media of some other type. It’s not the most elegant solution, but in a pinch, if you’re not storing backups locally, that is one of the workarounds you can use for restores.
What is incremental forever backup, and how does it help address remote office backup challenges?
Incremental forever backup is a concept that has been around for awhile now. Essentially, what it does is [it creates] a full backup, but after that, you never take another full backup again. Every backup from then on is incremental, so only changed blocks will be backed up. And what the backup software will have to do then is take those backups and turn them into what we sometimes call a synthetic full backup. So any of the incremental backups are still fully-recoverable points in time, but they have to be synthetically put together by the backup software to create a full backup.
The main benefit of this is you’re backing up a lot less data. There’s less data being sent over the wire. So this is definitely a technique that I see people using for remote branch offices. If you’re going to be trying to do backups over the WAN, and not have any kind of local media that you’re backing up to at your branch office, you can definitely use incremental forever [backups] to reduce the amount of WAN traffic. But remember, for the restore, when you’re trying to pull everything back at once, that’s when it gets a little bit tricky.
What about the challenges associated with making the initial backup – how are those solved?
You don’t want to ignore the secondary issues of staffing and security – you need to have that data be encrypted. But the number one challenge for remote site backup when you’re backing up over the WAN is definitely bandwidth. And even with incremental forever (backups), that initial backup can be painful. But after that, every other backup should be very quick. And I see people, in order to make that backup a little bit easier to deal with, they will seed or do something similar… to make that initial backup be local, and do all the incrementals over the WAN remotely.
Can you explain what source-based dedupe is, and how it fits into the picture?
That’s one of the technologies I see people using to make remote backup easier. So instead of – or in conjunction with – incremental backup, they use deduplication at the source side. Deduplication is a pretty common concept – the idea of only storing single instances of blocks, and in their place, putting pointers to the original copies so you don’t have to store multiple copies.
Source-side deduplication [is processed] at the backup agent, so there is some amount of overhead, not a huge amount, on whatever application you’re backing up. But the major benefit you get is the amount of WAN bandwidth and traffic is much, much smaller. So, incremental forever is a good tool, but I’d say source-side deduplication is much more powerful for WAN backups.
What other technologies should people consider?
The main other one that comes to mind is WAN optimization. People think of WAN optimization technology as big, honking appliances they have to put out there. But there are a lot of virtual editions and more lighter-weight branch office solutions out there that I do see people using quite a bit, in conjunction with remote backup.
Another option that is a little bit out there is to not backup at all – and instead, do some kind of replication to bring the data from your branches back to your main office. So a lot of times, something lightweight like server-based replication could be used. That’s a little bit more of a radical approach, but definitely a possibility. Once you replicate everything [from] your branch to the data center, you can back it up there.
And the last one is of course cloud… going to the cloud – some kind of backup as a service or cloud-based backup – can be a good solution for remote offices, instead of having to send it all the way back to your data center. I’m seeing more and more companies looking at sending it to the cloud… it does make sense in certain cases.