Microsoft SharePoint is gaining in popularity as a corporate collaboration tool; it's great for office efficiency, but tough on backups.
Microsoft Office SharePoint Server is an interesting suite of applications. I've heard from a number of users who extol its collaboration methods, but from a backup perspective, SharePoint is somewhat analogous to VMware and other server virtualization technologies. SharePoint may be solving the world's problems, but how do you back this thing up? (Interestingly, the backup solutions for SharePoint and VMware are eerily similar; more on that later.)
The challenge with backing up SharePoint is that it's not just one application, but a suite of applications that work together. Each SharePoint portal consists of one or more Web servers, application servers, query servers and index servers, all of which store their data in multiple SQL Server databases (at minimum, one content database and one configuration database). In a very small environment these can all be placed on a single physical server, but they're typically configured across multiple servers to provide some scalability.
A look inside SharePoint
SharePoint's configuration database obviously stores the configuration of SharePoint itself, including such things as:
- Internet Information Services settings, including IP addresses and Secure Sockets Layer (SSL) certificates
- Service accounts that are used to run various services such as search
- Search, connection, workflow, email, antivirus and logging settings
- Recycle bin settings, such as whether to have a multilevel recycle bin to protect against accidental deletion
Closely related to the configuration database is the administration database. Both of these databases are extremely important, which is why it's so surprising that most of Microsoft's built-in backup methods don't support restoring them. Oddly enough, they support backing them up, but restoring them isn't supported (see "What's up with SharePoint's configuration database?", below).
|What's up with SharePoint's configuration database?|
None of the native backup and recovery tools for SharePoint support backing up and restoring the configuration and administration databases from a live system. The reason Microsoft Corp. gives for this is that these databases must be restored to the same point in time when the other databases were backed up or the results could be unpredictable. Therefore, while the native tools often allow you to back up the whole site, they can't (or shouldn't) be used to restore these databases because these tools have no facility for assuring that they're being restored to the same point in time. For more information on this oddity, please consult this article.
The only supported way to solve this problem with native tools is to restore from a backup of a fully stopped farm. The procedure in this article, which is about how to move all the databases from one server to another, can be used to back up and recover the entire site. (As long as you're not using single sign-on [SSO] databases, in which case you have to handle it separately using this SSO procedure.)
And then, of course, there's the configuration and customization information that's not stored in the database at all. Those files need to be backed up and restored at the same time as the others. No wonder Microsoft says, "It's not supported." They even go so far as to tell you that you should document your configuration changes so that you can redo them, because there's a good chance that you won't be able to recover the configuration and administration databases.
In addition, some of SharePoint's customization is stored in files in the file system, not in a database at all. This means you have to back up both databases and file system data to fully back up SharePoint.
The content database is where all of SharePoint's collaborative content is stored. This includes Microsoft Office documents (e.g., Word, PowerPoint, Excel) and any communication related to those documents. One of the interesting things about how the SharePoint content database works is that as users share documents and store multiple versions of the same document in SharePoint, they significantly increase the amount of storage needed for their database.
Consider what you would do without SharePoint. You would put the file on a file share and turn Track Changes on. When you finished working on the file, you would send an email to your co-workers to take a look at it. They would review the document, make their changes and save them to that document. Track Changes keeps a record of all the edits and additions, and you didn't have to make a separate copy of the document. But SharePoint stores every version of the document, and it doesn't have deduplication enabled. This is an important point because if you thought deduping Exchange data received good data deduplication ratios, you're going to love the ones you get from SharePoint. (While we're focused here on backing up SharePoint, its versioning process also makes it a good candidate for primary storage deduplication.)
When planning your SharePoint backup and recovery system, you're obviously going to want to be familiar with the content databases, configuration databases and any other databases that are part of your SharePoint configuration. You also need to think about what you want to recover because the different backup and recovery options allow you to do things at different levels. In addition, some of the options allow you to recover at lower levels of granularity than others. A good place to start is with this Microsoft TechNet article that explains in detail the capabilities of the various backup and recovery options described here. The article focuses on using Microsoft tools like Data Protection Manager (DPM), but discusses other options as well.
Native backup and recovery options
The following is a summary of the backup and recovery options that are available free of charge with any SharePoint installation.
SharePoint Central Administration. This is a GUI option available when running SharePoint Central Administration. While it can back up the entire site, it has three very big limitations: It doesn't have scheduling capabilities; it can't be used to restore the configuration or administration databases; and it can't back up site collections.
SharePoint stasdm.exe Command Line. The command line utility (stasdm.exe) is very similar to the Central Administration option, but since it runs from the command line it can be used in concert with Windows Scheduled Tasks to provide scheduling of backups. It still can't be used to restore the administration or configuration databases. Unlike the Central Administration option, it can be used to back up site collections, but Microsoft warns against performing such backups because they say site collection backups can affect performance and should only be performed when the site collection is locked. Microsoft also notes that these types of backups can be particularly slow when dealing with site collections larger than 15 GB -- a very modest size for a site collection. In addition, this utility doesn't seem to like backups that run longer than 17 hours, as it automatically restarts them after 17 hours. Given these issues, Microsoft recommends that to do site collection backups, you should just move that site to its own database and use database backup tools.
SQL Server Backup. Because SharePoint stores most of its information in SQL Server, you can use SQL Server backup tools to back up most of its information, including the configuration and administration databases. You can also use those backups to restore the databases, but it's not supported. Given the synchronization issue, it would seem that as long as you make sure to synchronize what you're restoring then it should work just fine. The key thing is to ensure that no configuration changes are made during your backup window. However, you'll still be missing any customization information stored in the file system if this is the method you choose to back up SharePoint.
Because SQL Server backup tools can be run from the command line, you can schedule this to run at convenient times using Scheduled Tasks. It does require that you manually reattach your databases to the appropriate Web application after a recovery.
What it can't do is back up the search database, and for an odd reason: the search indexes aren't stored in SQL Server. Because you can't synchronize the search database after a database-only backup, this backup approach isn't a viable option for that database.
Windows Server 2008 Backup. The native backup and recovery system for Windows Server 2008 can be used to back up all those things that aren't in the databases (such as the configuration and customization files), but it can't be used to back up the databases themselves.
It seems that the native tools have as many limitations as they have benefits, but it's possible to create a "workable" solution if all you have are the native tools -- especially if you can do a regular shutdown of your farm. If you do a shutdown, you could do a SQL Server backup of all of the databases to a file system that's then backed up using the Windows Server 2008 backup system, along with the directories where customization and configuration information is stored.
Third-party SharePoint backup options
Obviously, to properly back up SharePoint, you need to back up all databases and some files in the file system, and you need to guarantee that these various backups are synchronized. A good recovery system would also allow you to restore the entire system, all configuration and customization data, as well as all content. In addition, it should be able to restore any of the above to various points in time, including the ability to recover individual pieces of content, such as a document.
The only way it seems that you're going to do all of that reliably is to invest in a commercial backup product -- and it's likely that the backup application you're using now can handle the chore. Every major backup package has an agent for SharePoint.
|Sampler: Backup products for Microsoft SharePoint|
Check out the companies listed below for the latest information on these products' SharePoint backup capabilities.
Backup products with SharePoint agents
The capabilities of each agent vary from one backup application to another, but they all have the same basic functionality. They're add-on agents to your backup software, much like a SQL Server or Exchange agent, that know how to talk to the SharePoint backup API. A well-written agent should only need to be told the name of the main SharePoint server, and it should be able to figure out everything from there. It should figure out the name(s) of any SharePoint farms associated with that server and back them up along with their configuration, administration and content databases, as well as back up any configuration data stored in the file system. All of this data is backed up directly to your backup system's preferred storage, be it disk, virtual tape library (VTL) or tape. Your backup application may actually be doing multiple types of backups under the covers (Microsoft SQL server, file system, etc.), but it should appear as one backup that works (or doesn't work) as a whole.
In addition to backup agents available for your favorite backup software package, there are products, such as AvePoint Inc.'s DocAve Backup and Recovery, Idera's SharePoint Backup and Quest Software Inc.'s Recovery Manager for SharePoint, that are "point solutions" designed just for SharePoint. These products are analogous to backup apps like PHD Virtual Technologies' esXpress, Veeam Software's Backup & Replication and Vizioncore Inc.'s vRanger Pro that are point backup solutions just for VMware. These are designed for firms that have a SharePoint installation, but are using a backup product that doesn't have a SharePoint agent; companies that don't like the capabilities of the agent; or organizations that can't afford the agent. These products tend to do everything you need a SharePoint backup product to do (they may even have more functionality than the agent offered by your backup app due to their specialized nature), but they don't integrate with your backup application. This typically means that their backups will be stored on disk; so if you want those backups to be put on your deduplication system or tape, you'll need to back them up with your other backup product.
A lot of commercial solutions use Microsoft's Volume Shadow Copy Service (VSS) to solve the synchronization problem. That is, they use the SharePoint VSS writer to quiesce SharePoint and the Windows VSS Writer to quiesce the system before backing up everything. That way everything that's backed up is synchronized to the same point.
BIO: W. Curtis Preston is an executive editor in TechTarget's Storage Media Group and an independent backup expert. Curtis has worked extensively with data deduplication and other data-reduction systems.