[Mysql] mysql-mgr集群节点主节点与两个从节点网络异常,造成的集群异常问题


本文总阅读量

1、mysql-mgr集群节点主节点(A)与两个从节点网络异常,主节点(A)发生切换,从节点(B)切换为主,报错如下:

1
2
3
2018-08-20T05:38:16.071653Z 0 [Warning] Plugin group_replication reported: 'Member with address wallet-mysql-2:3306 has become unreachable.'
2018-08-20T05:38:16.071730Z 0 [Warning] Plugin group_replication reported: 'Member with address wallet-mysql-3:3306 has become unreachable.'
2018-08-20T05:38:16.071744Z 0 [ERROR] Plugin group_replication reported: 'This server is not able to reach a majority of members in the group. This server will now block all updates. The server will remain blocked until contact with the majority is restored. It is possible to use group_replication_force_members to force a new group membership.'

2、对故障节点A操作,使其重新加入集群

A节点:

1
2
mysql -u root -p
start group_replication;

查看报错日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2018-08-20T07:29:40.796952Z 12 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2018-08-20T07:29:40.803975Z 26 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
2018-08-20T07:29:40.855935Z 12 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='wallet-mysql-2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2018-08-20T07:29:40.906041Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 4/10'
2018-08-20T07:29:40.949597Z 12 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='wallet-mysql-3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2018-08-20T07:29:41.001446Z 12 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor 7dd813bf-8e3d-11e8-8d92-000d3aa1c31e at wallet-mysql-3 port: 3306.'
2018-08-20T07:29:41.001738Z 29 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2018-08-20T07:29:41.005538Z 29 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'repl@wallet-mysql-3:3306',replication started in log 'FIRST' at position 4
2018-08-20T07:29:41.016839Z 30 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './relay-log-group_replication_recovery.000001' position: 4
2018-08-20T07:29:41.028408Z 29 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2018-08-20T07:29:41.028442Z 29 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
2018-08-20T07:29:41.028449Z 29 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2018-08-20T07:29:41.028493Z 12 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2018-08-20T07:29:41.028790Z 30 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
2018-08-20T07:29:41.085012Z 12 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='wallet-mysql-3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.

出现该问题的原因是现在的新的主节点B,已经清除了binglog,A节点需要恢复的日志已经找不到了,因此需要对A进行数据恢复。

3、在B节点mysqldump一份最新数据,命令如下:

1
/data/mysql/bin/mysqldump --all-databases --set-gtid-purged=ON --single-transaction -uroot -p***** > /data/backup/all.sql

4、将dump的数据库拷贝到A节点,对节点A进行恢复:

1
mysql -u root -p******* < /data/backup/all.sql

执行提示报错

1
2
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1840 (HY000) at line 24: @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.

解决方法:

1
2
3
4
5
6
7
8
9
10
提示只有@@GLOBAL.GTID_EXECUTED为空是,才能设置@@GLOBAL.GTID_PURGED的值,清空GLOBAL.GTID_EXECUTED的值
root@db 07:53: [(none)]> reset master;
ERROR 3190 (HY000): RESET MASTER is not allowed because Group Replication is running.
root@db 07:53: [(none)]> stop group_replication;
Query OK, 0 rows affected (1.01 sec)
root@db 07:53: [(none)]> reset master;
Query OK, 0 rows affected (0.27 sec)
如果数据库为只读的状态还需要将数据库改为读写模式:
root@db 07:54: [performance_schema]> set global read_only=0;
Query OK, 0 rows affected (0.00 sec)

5、重新恢复数据

1
2
mysql -u root -p******* < /data/backup/all.sql
mysql: [Warning] Using a password on the command line interface can be insecure.

恢复完成

6、重新将该节点加入集群

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
mysql -u root -p
root@db 07:56: [performance_schema]> select * from replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+-----------+-------------+-------------+--------------+
| group_replication_applier | | | NULL | OFFLINE |
+---------------------------+-----------+-------------+-------------+--------------+
1 row in set (0.00 sec)

root@db 07:56: [performance_schema]> start group_replication;
Query OK, 0 rows affected, 1 warning (3.03 sec)
root@db 07:57: [performance_schema]> select * from replication_group_members;
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| group_replication_applier | *** | wallet-mysql-2 | 3306 | ONLINE |
| group_replication_applier | *** | wallet-mysql-1 | 3306 | RECOVERING |
| group_replication_applier | *** | wallet-mysql-3 | 3306 | ONLINE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
3 rows in set (0.00 sec)

此时节点A正在与主节点同步数据,待同步完成之后再查看及节点之间的信息:

1
2
3
4
5
6
7
8
9
root@db 07:57:  [performance_schema]> select * from replication_group_members;
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
| group_replication_applier | *** | wallet-mysql-2 | 3306 | ONLINE |
| group_replication_applier | *** | wallet-mysql-1 | 3306 | ONLINE |
| group_replication_applier | *** | wallet-mysql-3 | 3306 | ONLINE |
+---------------------------+--------------------------------------+----------------+-------------+--------------+
3 rows in set (0.00 sec)

此时集群恢复完成。

目录
  1. 1. 1、mysql-mgr集群节点主节点(A)与两个从节点网络异常,主节点(A)发生切换,从节点(B)切换为主,报错如下:
  2. 2. 2、对故障节点A操作,使其重新加入集群
  3. 3. 3、在B节点mysqldump一份最新数据,命令如下:
  4. 4. 4、将dump的数据库拷贝到A节点,对节点A进行恢复:
  5. 5. 5、重新恢复数据
  6. 6. 6、重新将该节点加入集群

Proudly powered by Hexo and Theme by Lap
本站访客数人次
© 2020 zeven0707's blog