CodeGuard Database Backup Load Testing
Version: 1.1 / Date: 25 April, 2013
The goal of this test is to determine what impact the CodeGuard service has on a host server during a MySQL database backup. The results contained in this document were gathered after a series of four tests paralleling the tests previously done on FTP and SFTP backups by Jonathan Manuzak (see other document). They are by no means conclusive and real-world results will vary based on the composition of the websites being backed up and the hardware and software running the underlying host server.
From the perspective of the CodeGuard backup service, there is a single phase that occurs on a remote server: running the ‘mysqldump’ command. Running this command includes using the MySQL client, either remotely or tunneled over SSH, to simultaneously extract the database tables to a flat file format and transfer that file from the remote server to the CodeGuard service. Currently ‘mysqldump’ commands are run with the options ‘–quick’, ‘–single-transaction’, and ‘–skip-extended-insert’. These configure the operations one row at a time, attempt to ensure consistent state of some types of tables, and avoid multiple-row insert statements, respectively. Executing a mysqldump with different options may result in different performance characteristics.
Pull: All database backups are of this type. The entire database is downloaded in via the ‘mysqldump’ command, committed to a git repository, and uploaded to Amazon S3. Unlike website file backups, incremental downloads are not supported by mysqldump, so the entire database is downloaded from the remote server each time, although the backup is ultimately incremental since it is stored as a commit in a git repository. Test Database Only one database was used for this suite of tests. In cases where concurrent backups were taking place, the same database was downloaded simultaneously. While a somewhat contrived test setup, this to some approximation simulates other processes accessing database tables concurrently with backups.
Only one database was used for this suite of tests. In cases where concurrent backups were taking place, the same database was downloaded simultaneously. While a somewhat contrived test setup, this to some approximation simulates other processes accessing database tables concurrently with backups.
Size: 2723 MB
Row Count: 856262 (21 tables)
Type: Real MySQL database from language-learning website.
Server: The host used for testing was a RackSpace CloudServer.
OS: CentOS 6.3
MySQL Server: 5.1.67
Memory: 512MB CPU Cores: 1
Database Backup Testing Results
The graphs below illustrate the results of each test. Following each are notes discussing the findings.
Metrics and Definitions
- CPU usage: System: Percentage of CPU usage by system processes.
- CPU usage: User: Percentage of CPU usage by user processes.
- % Memory Used: Percentage of system memory used.
- eth0 in: Network transfer in from the public network connection in KB/s.
- eth0 out: Network transfer out to the public network connection in KB/s.
- Server Load (Last 5 Minutes): The numeric representation of the load on the system for the last five minutes. This is a unitless amalgamation of different metrics but, for this system with a single core processor, loads less than 1.0 are acceptable. Loads above 1.0 indicates that processes are waiting for CPU access. More information can be found here: http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
- I/O Average Queue Size: Weighted number of milliseconds spent doing I/Os. This can provide a measure of both I/O completion time and the backlog that may be accumulating.
- I/O Wait: Time in milliseconds spent waiting to perform I/O operations.
- I/O Reads / second: Number of file system reads per second.
- I/O Writes / second: Number of file system writes per second.
1. One Backup – MySQL Direct Connection
Figure 1a. Server load graph. View on ScoutApp
Figure 1b. Server I/O graph. View on ScoutApp
MySQL Dump Start/End: 6:55AM/7:22AM
As shown in Figure 1, mysqldump is neither CPU-intensive or memory-intensive for a single database backup. Server load remains below 0.1. I/O is affected more significantly, as mysqldump is a read-intensive operation. Reads peak at 55/s with latency topping 7 ms. Ethernet output (eth0 out) averages between 1500 and 2300 kB/s. The entire process completes in 27 minutes.
2. Five Concurrent Backups – MySQL Direct Connection
Figure 2a. Server load graph. View on ScoutApp
Figure 2b. Server I/O graph. View on ScoutApp
MySQL Dump Start/End: 10:28AM/1:33PM
Concurrent backups show a similar pattern with the main impact of concurrency causing all backups to take more time. Peak latency peaks at 13 ms, higher than the single backup, but peak reads/s are actually lower likely due to inefficiencies introduced by multiple processes competing for I/O access leading to more HD seeking. Network output also varies between 1500 and 2300 kB/s with a sharp break at half an hour into the backup lasting for 5 minutes. I do not yet have a good explanation for the pause in the backup process. Network output also lags the start of the backups by about 20 minutes. This means that transferring of the mysqldump file to CodeGuard is not occurring during this time.
The entire backup process completes in 3:05. This is 1.37 times the length of 5 consecutive 27 minute backups, giving an idea of the performance impact of the competition for the shared database resource. However, server performance does not appear to be significantly impacted.
3. One Backup – MySQL Tunneled over SSH
Figure 3a. Server load graph.View on ScoutApp
Figure 3b. Server I/O graph.View on ScoutApp
MySQL Dump Start/End: 9:28AM/9:53AM
A single backup tunneled over SSH mirrors the performance of a MySQL direct backup closely for all metrics. Completion time is very slightly faster at 25 minutes, although this was not tested for reproducibility. See discussion of the single MySQL direct backups for more information of the overall metrics.
4. Five Concurrent Backups – MySQL tunneled over SSH
Figure 4a. Server load graph. View on ScoutApp
Figure 4b. Server I/O graph.View on ScoutApp
MySQL Dump Start/End: 1:47PM/4:37PM
Multiple concurrent MySQL backups tunneled over SSH perform slightly better than concurrent MySQL direct backups. Notably, the pause in the process seen at 40 minutes in direct backups does not occur. Peak I/O wait time is 6.3 ms vs 13.2 ms for MySQL direct. Network average transfer speeds are lower, possibly due to compression on the SSH tunnel as well as the lack of a break in the process. Despite the lower network speeds the backups still completed slightly faster at 2:55 vs. 3:05 for MySQL direct. Similarly to the MySQL direct test, the concurrent backups completed in 1.4x the time of 5 sequential 25-minute backups.
These tests of database backup performance produced solid high-level information on the performance metrics that predict performance of a mysqldump, as well as the load it places on a server, but given the simulated nature of the test system they must be viewed with some skepticism.
Given the above caveat, analysis of the test results shows that:
- MySQL database backups are almost entirely I/O bound with very little CPU or memory usage.
- MySQL direct and SSH tunnel behavior is almost identical for single backups.
- Concurrent MySQL backups tunneled over SSH are slightly faster than MySQL direct backups and appear to have better sustained performance and lower impact on remote server performance.
- There is a slight penalty to running concurrent backups of the same database vs. sequential backups.
Further questions not fully addressed by this testing:
- What is the real world (i.e. user) impact of the elevated load and I/O wait times?
- Would using dedicated hardware change the outcome?
- What is the impact of backups to real world MySQL database performance. Does backing up certain tables cause significant slowdown in database performance that would impact user experience?
- Would different MySQL dump options cause different performance profiles and are there ways to optimize the process further with compression of the data either over SSH tunnel or with MySQL?
- It is possible that the limiting factor is network bandwidth in some phases of the backup. It is unknown what the impact to I/O performance would be in a system with a higher network bandwidth to I/O bandwidth ratio.
-Randall McPherson, Sr. Engineer