kentoh - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Can I use backed-up data for analytics?

While analytics and software for backed-up data appear to serve different purposes, there are a few ways the two can work together.

When users ask me if it is realistic to use backed-up data for a data analytics project, I first need to characterize what data analytics means in the context of this conversation. I use the SearchDataManagement definition that states:

Data analytics (DA) is the science of examining raw data with the purpose of drawing conclusions about that information. Data analytics is used in many industries to allow companies and [organizations] to make better business decisions and in the sciences to verify or disprove existing models or theories.

On the surface, it would seem that data backup software and data analytics are completely disjointed from one another. After all, backup software is designed to protect data, while data analytics exist as a mechanism for deriving business information from data. Even so, both technologies deal with raw data.

The idea of using backed-up data for data analytics is not as crazy as it might seem. Many backup vendors offer an instant recovery feature that allows businesses to run a virtual machine from backup storage until a backup can be restored. Some vendors take this concept a bit further and allow companies to use backup storage as a virtual development/test lab. Veeam, for example, has a Virtual Lab feature that could be used to make a database available for performing data analytics without fear of disrupting production data.

Although newer backup applications may include features that are conducive to performing data analytics, backup vendors are not in the data analytics business. If backup vendors want to offer data analytics capabilities, they might partner with an analytics company to deliver an analytics offering based on backed-up data, but I just don't see vendors developing such a product by themselves.

Next Steps

Mining backup and archive for business analysis

Backed-up data and Internet of Things add to data protection issues

Dig Deeper on Data storage backup tools

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

How have you used backed-up data for data analytics?
I'd want to ensure -- and I don't know how you could do that -- that poking the backed-up data to do analytics with it wouldn't end up messing up the backup in some way, in case you needed to restore the files at some point.
It will be an excellent idea to do analytics on backed up data and this may very well be the future - however while performing the analytics or testing them , its likely to disrupt the database. If that can be avoided , or we could do analiytics on images of the backups - that would help.
Data Analytics is one aspect of what can be done with the backup data. Any kind of reporting is another example. At it seems, one important restriction is that the backup data itself is not changed in that process; especially in case of incremental backups.
This is a great method for finding out more. If you aren't digging through existing and current data stores -- while people are accessing it -- you can actually do much more with the data and probably get better results from your analytics. It's a good plan.
Backup data is potentially a great source for analytics and other apps like file sharing, but obviously these applications would use copies of the relevant data, leaving the backup set intact and available for recoveries.
Doesn't sound crazy at all. Very frequently, we restore databases from back ups taken from production, and use those databases for some very heavy testing. It works out really well. 

I'm not so sure about analytics, because our business owners usually want extremely up to date data, and so it would depend on how old your backed up data was.
A bigger issue than copy management is dealing with the typically proprietary formats that backup apps use. In order to use backup data for anything--sync 'n' share, big data analysis, etc.--you first have to be able to extract it and put it in a usable form. Some backup apps use "standard" formats, like Linux tar,  but many don't--so we'll need some middleware development to be able to get at and use that bakcup data.
Great point Rich on the proprietary format of backup data. Newer backup and recovery apps allow you to bring up virtual copies of your protected servers (both physical and virtual) which allows you not only to access the data in original format but also run the applications within the servers. In this case the use scenarios are only limited by your RPO. You could for example run your daily report using last nights backup.