One of the things that makes CodeGuard a leader in the website backup space is our incremental and differential storage engine. This technology allows us to backup and restore websites quickly while using as little storage space as possible. For all of its advantages, this system does have one drawback. Due to the way that each backup incrementally builds on the one before it, removing old backups is a challenging task. Customers have requested the ability to remove old backups from their account and, to date, the solution has been a manual one. However, we hope to make this process much easier with our new Backup Retention functionality. The new Backup Retention option gives customers the ability to keep only the backups that they need and get rid of the rest, or they can keep every single backup for as long as they want.
Backup Retention allows customers to reduce their storage footprint by automatically removing old backups from our system. CodeGuard customers can now visit their Settings Page to adjust their Backup Retention period. You can choose to keep either 90 days of backups or all backups. If you select 90 days, we will periodically remove old backups so that only backups from the last 90 days show up. If you would like to keep more than 90 days of backups, just select “Keep all backups”. More options are on the way.
What’s so hard about removing old backups, anyway?
Although the process of removing old backups may seem like an easy task, creating the new Backup Retention feature took a lot of time and effort. The reason it was so difficult to implement has to do with the way that our incremental backup process works. Traditional backup processes copy all data from the source to the destination each time a backup is performed. CodeGuard uses an incremental backup process that only copies the changed data from the source to the destination for each backup. Both traditional and incremental solutions make a full backup initially. The difference is that traditional backup processes continue to take full backups, whereas incremental processes only store the changes on subsequent backups. This means that incremental backups take up much less space, but it also means that removing old backups is difficult. Since each new backup is based on the previous backup, removing an old one will affect all backups that occur after it.
CodeGuard’s incremental backup solution is based on a version control tool called Git. Git can be used to track changes made to a group of files, called a “repository”. It is an incredibly powerful tool originally built for developers to help them track code changes. It’s not just for keeping track of code, though. Git can also be used as a great backup utility for storing any kind of file. In fact, the popular backup program bup is based on Git. There is, however, a minor downside to using Git for backups: Developers frown upon rewriting the file history that Git keeps track of, and so the creators of Git did not make removing old backups, or “commits” easy. In addition, each new commit is closely tied with all of the other commits in the repository, which means that it is very difficult to untangle one backup from it’s neighboring commits.
To solve this problem, we created a new tool, that we call git-tail, to handle the complex logic needed to combine incremental backups and remove files that are no longer needed. It uses the “git filter-branch”, “git update-ref”, and “git-gc” commands, among others, to rewrite the Git commit history, remove old files and commits, and compress what’s left so that it takes up as little space as possible. Here’s a snippet of the code from our open-source git-tail repository hosted on GitHub:
This ruby code replaces the head of the commits we want to get rid of with a new commit object. It then rewrites the commit history, removes the old commit, and removes any references to the old commit.
One at a time, please
What happens if a CodeGuard customer tries to restore a backup while their backups are being removed? This is a problem we had to address while creating the Backup Retention functionality. If the restore is allowed to take place at the same time that removal is occurring, the files may not be restored correctly. Resource contention and synchronization are common problems for developers. To handle this issue CodeGuard has locking mechanisms in place that allow only one process to use a resource at a time. In the above scenario, the code that handles backup removal will ask for a lock before it begins. This lock prevents any other processes from accessing a customer’s backup while the removal task is running. Then, once the job is done, it releases the lock, and the restore process can begin accessing the customer’s backups.
Now that you understand how it works, it’s time to try it out! With our new Backup Retention feature, it’s now easier than ever to manage the storage size of your CodeGuard account!