How to Monitor MySQL Replication?

Simple hacks to master MySQL replication manager

Setting up MySQL replication is a good practice. Is it good enough for you? The answer is ‘NO’. You need to continually monitor your slaves to ensure that they continue to work seamlessly. Here’s an overview of the Slave variables to be monitored and the tool that will enable you to monitor them with ease.

Top variables to monitor on your slaves

Replication is best monitored by checking the following variables:

  1. SLAVE_RUNNING: This is a global status variable and its value can be checked using SHOW GLOBAL STATUS like ‘slave_running’. It can either be ‘ON’ or ‘OFF’.
  2. If slave_running is ‘ON’, then the slave is up and working fine, which means both the SQL thread and the IO thread are running. If either the SQL thread or the IO thread is not running then this variable would be ‘OFF’. Use can use SHOW SLAVE STATUS and try to determine whether there was an error or if the slave was stopped manually. Checking the last 100 lines of the slave’s error log should give a clue. Looking at Last_Error_Number and Last_Error_Message for specific error information will help you fix your slave.
  3. The following variables are a part of SHOW SLAVE STATUS.
  4. SLAVE_IO_RUNNING: It tells you if the Slave’s IO thread is able to connect to its master and if it is running fine. The possible values for this variable can be ‘YES’, ‘NO’ or ‘CONNECTING’.
  5. If this variable reads ‘NO’ then you will have to check the Last_Error_Number and Last_Error_Message and fix your slave. As of MySQL 5.1.20, these columns are aliases for Last_SQL_Errno and Last_SQL_Error. Before 5.1.20, they indicate the error number and the error message returned by the most recently executed statement. An error number of 0 and message of the empty string mean “no error.”
  6. SLAVE_SQL_RUNNING: It tells if the Slave’s SQL thread has started and is working fine. The possible values of this variable are ‘Yes’ or ‘No’.
  7. If this variable reads ‘No’, then the IO thread was caused to stop. You will have to check theLast_SQL_Errno and Last_SQL_Err for more information on the cause. An error number of 0 and message of the empty string mean “no error.” The Last_SQL_Error appears in the slave’s error log.
  8. SECONDS_BEHIND_MASTER: As the name suggests, this field tells how late your slave is. In other words, it tells the time in seconds that the Slave’s SQL thread lags while processing Master’s binary log. A continuous increase in this value is not a very good sign as it means that the slave is not able to catch up with its master. There is no threshold value for this variable to compare against, to determine if the value is high or low. It completely depends on your application, network speed etc. NOTE: Although “seconds_behind_master” is the best option available for determining slave lag available in all MySQL versions, it has been criticised for not always being accurate.
  9. MySQL 5.5 version has a status variable MASTER_HEARTBEAT_PERIOD which when set, will send beat packages to the Slave. After the loss of a beat, the Slave IO thread will disconnect and try connecting again. Various solutions for adding a ‘heartbeat’ mechanism have been proposed and patches and plugins are available for MySQL < 5.5. If you have added such ‘heartbeat’ mechanism you should monitor that as well.

MONyog – MySQL Monitor and Advisor:

A “MySQL DBA in a box” by Webyog is an agentless GUI based tool that helps MySQL DBAs manage MySQL servers. Replication monitoring and managing include-‘Replication’ tab that gives a topological view of all the Masters and their Slaves along with the SHOW SLAVE STATUS and SHOW MASTER STATUS.

  1. Auto-registering of slaves, given the Master details.
  2. Notification alerts through emails/SNMP traps that are sent out in the event of replication failure.

You can try an unrestricted copy of MONyog – MySQL Monitor and Advisor for 14 days.



Add yours
  1. 2
    Gregory Haase

    I’m surprised you haven’t mentioned nagios, which I thought was pretty commonly used for enterprise monitoring. If you have any kind of on-call rotation setup, then triggering your alerts from all servers and services via the same mechanism is key.

    I’m quite fond of the check_mysql_health script, which among other things such as uptime gives me:

    slave-lag (Seconds behind master)
    slave-io-running (Slave io running: Yes)
    slave-sql-running (Slave sql running: Yes)

  2. 3
    Shalmali Aradhya

    @Joel: Percona Toolkit as I mentioned is free! These tools not only help you monitor but also manage MySQL.

    @Gregory: Yes! But note that Nagios is a generic monitoring tool which can do much more than MySQL replication monitoring. Our blog lists various options specific to MySQL monitoring.

  3. 6
    Tim O'Donoghue

    How accurate is MONyog’s “seconds behind master” measure and what is that based on? My DBA tells me that “it isn’t that accurate” but I’m not sure whether to believe him on that; I think he is just trying to wriggle our from under the microscope I have put on our MySQL replication lag and the potential impact on our DR RPO 😉

  4. 9

    Yes .. default settings are WARNING = 450 seconds and CRITICAL = 600 seconds.
    You can customize this as you customize any counter (it is in the ‘replication’ group).

    This is not the best place for this discussion. If you need more help please use our Forums at In the Forums we can attach screenshot etc. what is not possible with a comment here.

  5. 12

    We are aware of this. However the problem only arises if you don’t have 1) a mouse with a scroll wheel 2) or a touch screen 3) or a touchpad supporting multi-finger gestures. In practice this means the problem only arises if you are using a laptop with Windows 7 or older (or possibly a laptop with some Linux variant). However even here you can scroill with the arrow-up and arrow-down kyes on the keyboard if you think the scrollbar on right side of the page is too narrow to ‘catch’.

    We use WordPress and a specific plugin for same, for our blog, and the narrownesss of the scrollbar is due to this – it is their design, not ours. It probably is because a wide scrollbar is waste of display space on small devices – phones and tablets. And it is getting quite common like that. I have seen it hundreds of places. Modern GUI design will modyly consider recent systems and hardware (phones, tablets, Win8+ (and Mac systems of same generations) where the scrollbar is irrelevant.

    Again: use arrow-keys for scrolling.

+ Leave a Comment