Fix mysql slave replication without mysqldump

By | 2018 年 3 月 27 日

The problem.

We have a double masters mysql settings as the high availability solution.
2 mysql hosts, say mysql1 and mysql2, both as masters, replication to each other.
mysql1 has the VIP and writable. mysql2 is readonly.

But, mysql1 just stoped working. After reboot mysql1 host, the mysql instance is back online. However, the VIP now is pointing to no host, and mysql1 stopped replication from mysql2.

Luckily that mysql2 is replicating from mysql1 well, and after mysql1 went down, no data came into either host. So mysql1 is holding same data as mysql2.

So, The problem is we need to fix the replication, to restart the replication from mysql2 to mysql1.

The detail

First, tried to start the slave on mysql1, by: start slave; But it failed with error message indicate that relay log position is incorrect.

When checking the slave status by show slave status;, it shows:

mysql> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: mysql2
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: master-bin.003739
          Read_Master_Log_Pos: 1013001671
               Relay_Log_File: slave-relay-bin.001221
                Relay_Log_Pos: 404311959
        Relay_Master_Log_File: master-bin.003214
             Slave_IO_Running: No
            Slave_SQL_Running: No
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 404311812
              Relay_Log_Space: 0
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
1 row in set (0.00 sec)

The Master_Log_File seems correct, but position (Read_Master_Log_Pos) and relay log records (Relay_Log_File and Relay_Log_Pos) are incorrect.

The solution

Based on the fact that mysql1 was the writable instance master before it went down, so we have the assumption that both instances have the same data. This is the foundation of the solution.

First, we need to clear the wrong slave status in mysql1. This is simply run:

RESET SLAVE;

Which will clean the relay log files in the disk, reset the relay logs and master logs record in mysql.

Then, we need to set the mysql1 slave replicator to the correct position, which should be the last replicated position from mysql2.

So just go to mysql2 console, run:

SHOW MASTER STATUS;

It will give a result like:

mysql> show master status;
+-------------------+----------+--------------+------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.003739 |      749 |              |                  |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

That gives the current master bin log file name and position. In our situation, we can just use this position to continue the replication in mysql1.

Back to mysql1 console, run:

CHANGE MASTER TO
    MASTER_LOG_FILE='master-bin.003739',
    MASTER_LOG_POS=749;

Now the mysql1 slave is ready to replicate data from mysql2 from the position gave above.

Then, we can start the slave by:

START SLAVE;

This time, it ran without any error. And check slave status:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: mysql2
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: master-bin.003739
          Read_Master_Log_Pos: 749
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 254
        Relay_Master_Log_File: master-bin.003739
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 825
              Relay_Log_Space: 410
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 169231620
1 row in set (0.00 sec)

Indicates that the slave now is running (Slave_IO_Running, Slave_SQL_Running) with 0 second slave delay. That means mysql1 now is replicating data from mysql2.

Notes

According to the operations, the bin logs are indexed with a filename / position pair as cursor. This is applied to both master bin logs, relay logs.

The replication on master side, is generating master bin log files in the file system, and by SHOW MASTER STATUS, you can get the current head of master bin log file name and position.

On the slave side, it requesting data from master by offering the position cursor (master bin file name / position pair) to master and get back the db modification data after that position in master db. Then slave will replay the modifications and records them as slave relay log files in the filesystem. In the meanwhile update the information within mysql, which can be checked in SHOW SLAVE STATUS. The replay progress is recorded in the relay log file name and relay log position.

The RESET SLAVE will clean relay log files from the file system and also reset the related information in mysql.

The CHANGE MASTER can provide master information and cursor information to the replication slave, and after START SLAVE, the slave will start replicating data from that position.

发表评论

电子邮件地址不会被公开。 必填项已用*标注