By David Fox
Archiware P5 Backup has two special backup strategies designed to assist in situations where taking a ‘full’ backup is not feasible. In this blog article, we’ll look at both Synthetic and Progressive backup methods in some detail.
Backup basics
Let us begin by explaining the concept of ‘full’ and ‘incremental’ backups in order that these more complex backup techniques can be understood with some context.
We take backups of our important data/storage so that we secure it from accidents/disasters. On a regular basis, probably every day, the backup task should run and update the stored backup with anything that has changed since the last time it ran. Traditionally this process begins by first backing up everything, this is called a full backup. This can take a long time and also cause inconvenience, since the source storage performance is impacted during this period while every file is read and added to the backup. Therefore we generally want to take full backups only as frequently as necessary.
Incremental backups are used, once we have our full backup. An incremental backup compares the current set of files in the stored backup, with the last backup, only saving the differences. Incremental backups are therefore quicker and more convenient. The backup index has information about every file saved so far, so we compare this with the live filesystem to arrive at the set of files needing to be saved.
However, if we commit to only taking a single full backup, and then revert to incremental backups forever, we also commit to our backup requiring more storage space over time, since nothing is ever deleted and we add data from our incremental backups every day. All backups must be kept because a file needed for restore could be in any one of them.
The traditional way to avoid requiring ever-increasing storage is to periodically repeat the full backup task. By doing this, we create a break in our backup. We begin a new ‘backup cycle’. Once a second full backup has been performed, it is possible to re-use all the storage allocated to previous backups for future backups, since we have a new backup cycle from which we can restore in the event of a problem. Variations of this strategy are commonly used in commercial environments.
Avoiding full backups
Both Synthetic and Progressive backup methods are implemented in P5 for cases where a full backup is not feasible. They’re useful in specific cases, as described below.
The Synthetic backup is an older approach that synthetizes a full backup onto new backup storage media, by clever use of the backup index. It’s useful the source storage is relatively slow, resulting in traditional full backups being slow to complete. The process queries the backup index for the most recent state of the filesystems being backed up. We then take this list of files, read them from the backup storage and write out to new backup storage, thereby creating what is identical to a full backup, without needing to read from the slow source disk storage at all.
This reading/writing happens in parallel. Therefore, in the case of LTO tape, the process requires two physical LTO tape drives, so one tape can be read while another is written. P5 streams the data from one tape to another, requiring only a small memory buffer in the middle. The new full backup behaves just the same as a traditionally-taken backup. Further incremental backups can be run ‘on top’ of the full backup in the usual way.
Progressive backups
This is a newer approach that eliminates the full backup entirely. It’s useful when writing to backup storage that’s slow to access (e.g. cloud storage) or where particularly large data-sets are being protected. It works by saving additional data when each incremental backup runs to ensure full backups are not needed. In order for this to work, two parameters are required:
- A ‘retention period’ for the backup, how long should a backed up file remain in the backup when it is no longer the current version of the file from the source storage. The backup will always have current files from the source storage, but how far back should changed and deleted files be kept in the backup?
- A ‘backup window’ defined each day when it is convenient for the backup to run. Since the backups do additional work (see below) beyond just saving what has changed on a given day, we need to allow additional time for them to run each day so all necessary tasks can be completed over time. These backup windows have a start time and a duration. At the end of the window, the backup will terminate gracefully until the following run.
To understand what additional data needs to be saved during incremental backups, one needs to appreciate that backup data is saved into discrete volumes (in case of LTO tape) or ‘chunks’ in the case of disk and cloud storage configured as a ‘container’ within P5. It is these volumes or chunks that must be recycled and re-used in their entirety in order for the oldest data that is no longer needed to be removed from the backup.
P5 takes the retention period set in the backup configuration, and attempts to remove the oldest storage volumes or chunks from the backup. Some of the data stored here will still be needed in the backup, since the files still exist on the storage being saved. That data is read from the older volumes/chunks and written to the current one, as part of the ongoing incremental backup.
Progressive backup is required when using P5 to backup to either disk or cloud storage, where the storage is configured using ‘containers’, the default since version X. Progressive backup can also be used when backing up to LTO tape.
Conclusion
Full backups are often not viable in commercial environments, due to backing up storage that is slow to read from and writing to backup storage that’s slow to write to. P5 Backup includes advanced backup techniques that provide options for administrators looking for flexibility. For backup to LTO tape or VTL, we recommend sticking with the ‘traditional’ technique of repeating full backups at intervals. This allows for recycling of older tape/VTL volumes over time, to make way for new data.
In environments where the source storage is slow to read from, a synthetic backup can be used with LTO tape to create a full backup without needing to read from the source storage at all.
When writing to cloud and disk using the P5 container storage format, progressive backups can be taken, avoiding the need to take repeating full backups by saving additional data as part of the incremental backups.
Synthetic and progressive backup methods make P5 Backup a particularly strong and flexible software for protecting your data.