Docker is a wildly successful Linux-based open source project. It virtualizes applications within the Linux operating system without requiring the addition of a hypervisor. The application is abstracted and fooled into believing it is the only application on the operating system (OS) using the Linux kernel resource isolation function. In other words, the Linux application is placed in a Docker data container that utilizes all the capabilities of the Linux OS and isolates that application.
Docker containers offer the mobility and isolation of virtual machines (VMs), but operate at a tiny fraction of their overhead. Before delving deeper into Docker data protection issues, it is necessary to clarify the differences between a Docker image and a Docker data container. A Docker image includes the OS with one or more applications. The Docker container, which is the focus of this tip, is a running instance derived from an image.
Docker data container protection is simply not as mature or sophisticated today as hypervisor VM data protection. There is no equivalent to VMware vSphere Storage APIs-Data Protection, Microsoft Hyper-V Volume Shadow Copy Service or even the kernel-based VM snapshot API. This makes Docker container protection a bit more challenging. The good news is that there are several methods available to accomplish it.
Method 1: Docker built-in backup and recovery
Before backing up a Docker data container, the container's current state must be saved as a Docker image. The container's running state is then briefly quiesced to take a snapshot and commit the image as a new Docker image with a different name -- typically a time-based variant of the previous image. To deploy that Docker container backup as an image on a different Docker host system, you need to either push it to a private Docker repository or save it as a .tar file so the image can be moved to any Docker host system.
Recovering the Docker container depends on how it was deployed. If the image was pushed to a private Docker repository, use the Docker "run" command to start a new instance of the container. If the image was saved as a .tar file, the .tar backup file must be loaded into the Docker host system local image repository and then "run" to start the new instance of the container.
Built-in Docker backup and recovery is anything but automated. It requires either manually conducting those backups and recoveries each and every time or writing scripts. Most IT administrators write scripts. Although scripting is not that difficult, they do have a tendency to break when software changes, infrastructure is altered and so on. It is essential that scripts be documented and undergo quality assurance (QA) both upfront and whenever the ecosystem changes to prevent backup and recovery failures. When the QA team finds problems with the scripts, and they are patched or rewritten to correct those problems, the documentation must be upgraded. Although this may seem like an onerous process, it is crucial to ensure Docker data container protection works as desired.
Many IT administrators do not want to go through the hassle to document, QA, troubleshoot or fix, patch and update scripts, nor do they want to manually run all of the backups and recoveries. They prefer self-contained, third-party and supported Docker data protection.
Method 2: Traditional, file-based backup and recovery
Traditional file backup has been popular over the years because it backs up many different types of servers, OSes, file systems, endpoints, hypervisors, and databases or structured applications. And because the technology works on Linux and Windows, it works with a Docker data container.
Traditional, file-based backup and recovery requires an OS or file-system agent; an agent for structured applications such as relational databases, email and so on; and a backup (i.e., media) server. The file-system agent has administrator privileges, scans the file system and backs it up. The structured application agent quiesces the application briefly while it is backed up. Docker containers appear to the file-system agent as just another set of files to be backed up. When the applications in the containers are structured, a structured application agent must also be installed as part of the Docker image and running container.
There are several issues with this method. Agents are pieces of software that require implementation, management, patches, fixes, updates and more. Many agents are disruptive, requiring an OS reboot where all containers are disrupted, whereas others require an application reboot for every implementation, patch, fix or update. That means they have to be scheduled when the disruption has the least impact on operations, which is usually during the weekend or late at night. This is the reason why so many IT organizations have moved to the agentless backups offered via hypervisor APIs. As previously mentioned, there is no current equivalent API for Docker containers. If an IT manager elects not to utilize an agent for the structured applications running in Docker containers, then backups are only crash-consistent vs. application-consistent.
Crash-consistent vs. application-consistent
Crash-consistent is like shutting off the machine in the middle of ongoing operations. There is a statistically significant possibility that the application will be corrupted. Application-consistent means the application is shut down in an orderly manner, making sure all operations are completed in the correct order. It enables the application to be recovered with the assurance that it will not be corrupted.
Method 3: Storage snapshots
Storage snapshots are simple to use and most snapshots consume little to no additional capacity, unless an actual copy of the data is required. Then the snapshot is copied and replicated to another volume, storage system or even cloud storage. The number of snapshots available per volume, LUN or file system -- assume one per Docker data container -- varies by vendor storage system. Some can do a lot, others not so many. One relatively new startup, Reduxio, can even timestamp every write and deliver a virtual snapshot based on each of those writes, providing a continuous snapshotting-like capability. Using these capabilities requires the storage to also be the primary storage for the Docker images and containers.
The critical issue with storage snapshots is that they are crash-consistent, not application-consistent -- the exception being Reduxio. In addition, many storage systems have hard limits on the number of snapshots retained per any given volume, LUN or file system. That requires older snapshots be replicated, which consumes more storage and requires additional storage systems.
Storage snapshots are often combined with backup applications to provide application consistency. The structured application backup agents are installed with the structured application. Before the storage system takes a snapshot, the agent quiesces the structured application, the media backup server tells the storage system to take the snapshot and then it tells the agent to restart the structured application. This provides an application-consistent snapshot, which is currently available from Commvault, EMC, Hewlett Packard Enterprise, Veritas and several others. This methodology still has issues with managing structured application agents.
One variant of the just-described combination is copy data management, which is offered by Actifio, Cohesity and Rubrik. Copy data management is a combination of file backup and storage snapshotting in a single system. There is no external media backup server. These offerings tend to be hypervisor API-focused. Only one, Actifio, has OS-based agents so it can back up Docker containers and images. But none of them have structured application consistency at this time.
Method 4: Agentless cloud-based backup and recovery
As of this writing, only Asigra-powered cloud-based backup and recovery services provide Docker data container and Docker image-based backup. The services provide an on-site physical or virtual appliance that backs up the OS and all Docker containers without an agent. The most recent backups are kept locally in the cloud appliance for faster recoveries. In addition, all the backups are kept in the cloud backup and recovery service provider's data center. After the first full-volume backup, all additional backups are incremental forever, providing virtual, full-volume, one-pass recoveries. All backup data sets in the cloud are deduplicated and compressed.
The downside to this method is that it must be licensed through a cloud service provider, not the software developer, even though it is available as a private license.
How Docker data container storage works