[Mysql] Old incarnation found while trying to add node


本文总阅读量

0、集群环境介绍

1
2
3
4
mysql-mgr单主集群:
mw-mysql-1:primary(节点1)
mw-mysql-2:second (节点2)
mw-mysql-3:second (节点3)

1、收到zabbix报警mysql-mgr集群有一个节点down,查看节点信息,内容如下

1
2
3
4
5
6
7
8
select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 77039511-8e42-11e8-b6d4-000d3aa1a189 | mw-mysql-1 | 3306 | ONLINE |
| group_replication_applier | 81143c99-8e3d-11e8-a501-000d3aa1b575 | mw-mysql-2 | 3306 | ONLINE |
| group_replication_applier | 8bab920f-8e3d-11e8-a045-000d3aa09b70 | mw-mysql-3 | 3306 | UNREACHABLE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+

因为还有两个节点节点存活,表示集群还是可用的。(后面才发现只依靠这个信息是错误的)

2、节点3集群状态变为不可达,查看节点3的日志:

1
2
3
4
5
6
7
 waited (count) when Workers occupied = 590 waited when Workers occupied = 11412795300
2018-10-27T06:56:10.116159Z 0 [Warning] Plugin group_replication reported: 'The member with address mw-mysql-2:3306 has already sent the stable set. Therefore discarding the second message.'
xdr_bytes: out of memory
xdr_bytes: out of memory
xdr_bytes: out of memory
2018-10-27T06:58:03.284079Z 0 [Note] Plugin group_replication reported: 'dispatch_op /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/rapid/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:3810 die_op executed_msg={53f6a873 47559528 0} delivered_msg={53f6a873 47559528 0} p->synode={53f6a873 47559501 0} p->delivered_msg={53f6a873 47559525 0} p->max_synode={53f6a873 47559528 1} '
2018-10-27T06:58:03.285626Z 0 [Note] Plugin group_replication reported: 'dispatch_op /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/rapid/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:3810 die_op executed_msg={53f6a873 47559528 0} delivered_msg={53f6a873 47559528 0} p->synode={53f6a873 47559501 0} p->delivered_msg={53f6a873 47559525 0} p->max_synode={53f6a873 47559528 2} '

3、节点3因为内存不足导致节点集群进程down掉,重新启动mysql,尝试将节点3加入集群,提示加入失败:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
2018-10-27T07:49:40.493156Z 33 [Note] Plugin group_replication reported: 'Group communication SSL configuration: group_replication_ssl_mode: "DISABLED"'
2018-10-27T07:49:40.493724Z 33 [Note] Plugin group_replication reported: '[GCS] Added automatically IP ranges 10.1.150.12/24,10.1.150.16/24,127.0.0.1/8 to the whitelist'
2018-10-27T07:49:40.494212Z 33 [Note] Plugin group_replication reported: '[GCS] Translated 'mw-mysql-3' to 10.1.150.16'
2018-10-27T07:49:40.494374Z 33 [Warning] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
2018-10-27T07:49:40.494426Z 33 [Note] Plugin group_replication reported: '[GCS] SSL was not enabled'
2018-10-27T07:49:40.494453Z 33 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "9275d4e4-8e42-11e8-b217-000d3aa1a189"; group_replication_local_address: "mw-mysql-3:24901"; group_replication_group_seeds: "mw-mysql-1:24901,mw-mysql-2:24901,mw-mysql-3:24901"; group_replication_bootstrap_group: false; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: 1000000; group_replication_ip_whitelist: "AUTOMATIC"'
2018-10-27T07:49:40.494471Z 33 [Note] Plugin group_replication reported: '[GCS] Configured number of attempts to join: 0'
2018-10-27T07:49:40.494476Z 33 [Note] Plugin group_replication reported: '[GCS] Configured time between attempts to join: 5 seconds'
2018-10-27T07:49:40.494526Z 33 [Note] Plugin group_replication reported: 'Member configuration: member_id: 3306101; member_uuid: "8bab920f-8e3d-11e8-a045-000d3aa09b70"; single-primary mode: "true"; group_replication_auto_increment_increment: 1; '
2018-10-27T07:49:40.494937Z 94 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2018-10-27T07:49:40.545004Z 97 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './relay-log-group_replication_applier.000508' position: 4
2018-10-27T07:49:40.545034Z 33 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2018-10-27T07:49:40.545059Z 33 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 1'
2018-10-27T07:49:40.545063Z 33 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 3306101'
2018-10-27T07:49:40.545486Z 0 [Note] Plugin group_replication reported: 'XCom protocol version: 3'
2018-10-27T07:49:40.545513Z 0 [Note] Plugin group_replication reported: 'XCom initialized and ready to accept incoming connections on port 24901'
2018-10-27T07:49:40.762663Z 0 [Warning] Plugin group_replication reported: 'read failed'
2018-10-27T07:49:40.780216Z 0 [ERROR] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 24901'

4、因为在节点3查不到任何有用的报错信息,尝试在节点1查看有没有其他报错,看到一条比较奇怪的报错:

1
[Note] Plugin group_replication reported: 'Old incarnation found while trying to add node mw-mysql-3:24901 15406269616484810.'

官方文档没有关于这个信息的任何提示,在这个链接https://dba.stackexchange.com/questions/214779/how-to-delete-previous-incarnation-in-mysql-w-group-replication查到碰到这个问题只能重启集群。

5、查看节点2的信息,error.log没有任何更新,正常情况下如果集群节点正常应该会每个120s左右,会刷新一下信息:

1
[Note] Multi-threaded slave statistics for channel 'group_replication_applier': seconds elapsed = 131; events assigned = 3687425; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 65388998400 waited (count) when Workers occupied = 25728 waited when Workers occupied = 124142987600

6、查看节点1应用连接是正常的,但是dml操作一直hang主,没有任何结果,重新启动应用程序问题照旧。感觉集群状态虽然查询正常,但是已经不能提供对外服务了,最后决定把所有的库stop,重新集群系统,最后集群恢复正常,对外提供服务正常,查询集群服务正常:

1
2
3
4
5
6
7
8
select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 77039511-8e42-11e8-b6d4-000d3aa1a189 | mw-mysql-1 | 3306 | ONLINE |
| group_replication_applier | 81143c99-8e3d-11e8-a501-000d3aa1b575 | mw-mysql-2 | 3306 | ONLINE |
| group_replication_applier | 8bab920f-8e3d-11e8-a045-000d3aa09b70 | mw-mysql-3 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
目录
  1. 1. 0、集群环境介绍
  2. 2. 1、收到zabbix报警mysql-mgr集群有一个节点down,查看节点信息,内容如下
  3. 3. 2、节点3集群状态变为不可达,查看节点3的日志:
  4. 4. 3、节点3因为内存不足导致节点集群进程down掉,重新启动mysql,尝试将节点3加入集群,提示加入失败:
  5. 5. 4、因为在节点3查不到任何有用的报错信息,尝试在节点1查看有没有其他报错,看到一条比较奇怪的报错:
  6. 6. 5、查看节点2的信息,error.log没有任何更新,正常情况下如果集群节点正常应该会每个120s左右,会刷新一下信息:
  7. 7. 6、查看节点1应用连接是正常的,但是dml操作一直hang主,没有任何结果,重新启动应用程序问题照旧。感觉集群状态虽然查询正常,但是已经不能提供对外服务了,最后决定把所有的库stop,重新集群系统,最后集群恢复正常,对外提供服务正常,查询集群服务正常:

Proudly powered by Hexo and Theme by Lap
本站访客数人次
© 2020 zeven0707's blog