As some of our customers may have already noticed, we at CodeGuard have been hard at work adding a new product line – Microsoft SQL Server Database backups. Today, we wanted to share some of the technical aspects around the creation of this new functionality.
Until recently, we have only provided MySQL database backups. Like many other cloud-based companies, much of the work we do is performed on the Linux operating system. While that works well for many of the systems we back up, including SFTP, FTP, MySQL, and WordPress, currently, it is better to backup Microsoft SQL Server using a Windows-based solution. Fortunately, these servers, like other parts of our infrastructure, run in Amazon’s Elastic Compute Cloud, where we can create and remove new Windows Servers in only a few minutes. With some custom code written to execute the backups on this group of servers, we are able to provide the same type of daily backups to our Microsoft SQL Server clients that we do to clients running MySQL.
There are multiple ways to backup Microsoft SQL Server, including creating a backup through the SQL Server administrative interface, using a tool such as MyLittleBackup, or creating a backup of the Windows Drive where the Microsoft SQL Server database stores its information. Most of these methods will store the database as a binary blob in a proprietary format. This format makes using the type of differential backup that CodeGuard offers impossible. This, in turn, means that preserving Daily and Point-in-time backups requires more and more space. Storage would quickly fill up or become prohibitively expensive.
One commonly used tool, Microsoft SQL Server Management Studio (Microsoft SSMS), does offer the ability to export the data in an essential format called TSQL. In a small menu a few levels deep, if one ticks the proper set of check boxes, one can generate what is called a “script” for the database. The process is error-prone, but this script can then be used to load the same data to restore to a specific point in time for that database. However, because we need to perform this backup for our customers every day, we felt that there must be a better way than jumping through these hoops.
Our solution uses the same underlying Microsoft APIs that SSMS uses, but we wrap them in a service to make them more robust and more responsive to some of the diverse methods developers and hosting providers use when deploying Microsoft SQL Server. Using our process, we generate a TSQL script, just like SSMS, and save it to our differential and incremental storage system. If we do this daily, each additional backup only requires us to store the changes that occurred on that day (usually much smaller than the rest of the database).
Some readers may be wondering more about the specifics of the method that we are using to import these Microsoft SQL Server files into our backup storage system. There were a number of options available, but the path we ultimately chose was to build a standalone tool which would appear to the existing CodeGuard backup system to operate in the same way as the MySQL-specific “mysqldump” tool, which we already use to back up some MySQL databases. In this way, we could reuse much of the backup process that already existed in our core system, applying tried and tested code to the problem at hand.
However, no mysqldump-like tool that we found operates on Linux and connects to Microsoft SQL Server in the same way. To make the two appear the same, we wrote a small program to run on our Linux servers in Go, one of the programming languages we use regularly here at CodeGuard. Like mysqldump, the tool we wrote can be called in a predictable way from the Linux command line, and results in a TSQL file very similar in appearance the database files created by mysqldump. However, the mechanism used to generate the file is quite different. In this case, we leverage Amazon Simple Queue Service (SQS) and Amazon Auto-Scaling Groups to start and manage Windows Server EC2 instances. When the Linux Go tool is run, it sends a message to the MSSQL SQS queue we have created. There, an EC2 instance running Windows Server will be waiting to take the job that we have dispatched over SQS and generate the TSQL file. Based on the number of jobs submitted, the Auto-Scaling Group will run more or fewer Windows Servers, to get the best of both throughput and cost savings. When a job is complete on the Windows side, the information is uploaded to Amazon S3, and a message is sent back to the Go tool. We use S3 as an intermediary because the size of the file is much greater than the permitted size of an SQS message. At the end of this part of the process, the Go tool receives the returned message, downloads the backup from S3, and feeds it to the backup system just as mysqldump would.
Similar to our MySQL option, we allow our customers (in fact, we encourage them!) to request zip files containing the TSQL scripts. These can be downloaded conveniently from our system as a compressed file. An owner can then load them locally into tools like SSMS whenever needed for data recovery, migration, auditing or testing purposes. The process to add these databases is usually straightforward and is described in our Support Center.
For those of you out there running Microsoft SQL Servers, please give it a try! Feel free to let us know how your experience went by emailing: [email protected]. We appreciate the feedback!