[Mysql] Old incarnation found while trying to add node

0、集群环境介绍

mysql-mgr单主集群：
mw-mysql-1：primary（节点1）
mw-mysql-2：second （节点2）
mw-mysql-3：second （节点3）

1、收到zabbix报警mysql-mgr集群有一个节点down，查看节点信息，内容如下

select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 77039511-8e42-11e8-b6d4-000d3aa1a189 | mw-mysql-1  |        3306 | ONLINE       |
| group_replication_applier | 81143c99-8e3d-11e8-a501-000d3aa1b575 | mw-mysql-2  |        3306 | ONLINE       |
| group_replication_applier | 8bab920f-8e3d-11e8-a045-000d3aa09b70 | mw-mysql-3  |        3306 | UNREACHABLE  |
+---------------------------+--------------------------------------+-------------+-------------+--------------+

因为还有两个节点节点存活，表示集群还是可用的。（后面才发现只依靠这个信息是错误的）

2、节点3集群状态变为不可达，查看节点3的日志：

 waited (count) when Workers occupied = 590 waited when Workers occupied = 11412795300
2018-10-27T06:56:10.116159Z 0 [Warning] Plugin group_replication reported: 'The member with address mw-mysql-2:3306 has already sent the stable set. Therefore discarding the second message.'
xdr_bytes: out of memory
xdr_bytes: out of memory
xdr_bytes: out of memory
2018-10-27T06:58:03.284079Z 0 [Note] Plugin group_replication reported: 'dispatch_op /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/rapid/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:3810 die_op executed_msg={53f6a873 47559528 0} delivered_msg={53f6a873 47559528 0} p->synode={53f6a873 47559501 0} p->delivered_msg={53f6a873 47559525 0} p->max_synode={53f6a873 47559528 1} '
2018-10-27T06:58:03.285626Z 0 [Note] Plugin group_replication reported: 'dispatch_op /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/rapid/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:3810 die_op executed_msg={53f6a873 47559528 0} delivered_msg={53f6a873 47559528 0} p->synode={53f6a873 47559501 0} p->delivered_msg={53f6a873 47559525 0} p->max_synode={53f6a873 47559528 2} '

3、节点3因为内存不足导致节点集群进程down掉，重新启动mysql，尝试将节点3加入集群，提示加入失败：

2018-10-27T07:49:40.493156Z 33 [Note] Plugin group_replication reported: 'Group communication SSL configuration: group_replication_ssl_mode: "DISABLED"'
2018-10-27T07:49:40.493724Z 33 [Note] Plugin group_replication reported: '[GCS] Added automatically IP ranges 10.1.150.12/24,10.1.150.16/24,127.0.0.1/8 to the whitelist'
2018-10-27T07:49:40.494212Z 33 [Note] Plugin group_replication reported: '[GCS] Translated 'mw-mysql-3' to 10.1.150.16'
2018-10-27T07:49:40.494374Z 33 [Warning] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
2018-10-27T07:49:40.494426Z 33 [Note] Plugin group_replication reported: '[GCS] SSL was not enabled'
2018-10-27T07:49:40.494453Z 33 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "9275d4e4-8e42-11e8-b217-000d3aa1a189"; group_replication_local_address: "mw-mysql-3:24901"; group_replication_group_seeds: "mw-mysql-1:24901,mw-mysql-2:24901,mw-mysql-3:24901"; group_replication_bootstrap_group: false; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: 1000000; group_replication_ip_whitelist: "AUTOMATIC"'
2018-10-27T07:49:40.494471Z 33 [Note] Plugin group_replication reported: '[GCS] Configured number of attempts to join: 0'
2018-10-27T07:49:40.494476Z 33 [Note] Plugin group_replication reported: '[GCS] Configured time between attempts to join: 5 seconds'
2018-10-27T07:49:40.494526Z 33 [Note] Plugin group_replication reported: 'Member configuration: member_id: 3306101; member_uuid: "8bab920f-8e3d-11e8-a045-000d3aa09b70"; single-primary mode: "true"; group_replication_auto_increment_increment: 1; '
2018-10-27T07:49:40.494937Z 94 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2018-10-27T07:49:40.545004Z 97 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './relay-log-group_replication_applier.000508' position: 4
2018-10-27T07:49:40.545034Z 33 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2018-10-27T07:49:40.545059Z 33 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 1'
2018-10-27T07:49:40.545063Z 33 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 3306101'
2018-10-27T07:49:40.545486Z 0 [Note] Plugin group_replication reported: 'XCom protocol version: 3'
2018-10-27T07:49:40.545513Z 0 [Note] Plugin group_replication reported: 'XCom initialized and ready to accept incoming connections on port 24901'
2018-10-27T07:49:40.762663Z 0 [Warning] Plugin group_replication reported: 'read failed'
2018-10-27T07:49:40.780216Z 0 [ERROR] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 24901'

4、因为在节点3查不到任何有用的报错信息，尝试在节点1查看有没有其他报错，看到一条比较奇怪的报错：

1	[Note] Plugin group_replication reported: 'Old incarnation found while trying to add node mw-mysql-3:24901 15406269616484810.'

官方文档没有关于这个信息的任何提示，在这个链接https://dba.stackexchange.com/questions/214779/how-to-delete-previous-incarnation-in-mysql-w-group-replication查到碰到这个问题只能重启集群。

5、查看节点2的信息，error.log没有任何更新，正常情况下如果集群节点正常应该会每个120s左右，会刷新一下信息：

[Note] Multi-threaded slave statistics for channel 'group_replication_applier': seconds elapsed = 131; events assigned = 3687425; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 65388998400 waited (count) when Workers occupied = 25728 waited when Workers occupied = 124142987600

6、查看节点1应用连接是正常的，但是dml操作一直hang主，没有任何结果，重新启动应用程序问题照旧。感觉集群状态虽然查询正常，但是已经不能提供对外服务了，最后决定把所有的库stop，重新集群系统，最后集群恢复正常，对外提供服务正常，查询集群服务正常：

select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 77039511-8e42-11e8-b6d4-000d3aa1a189 | mw-mysql-1  |        3306 | ONLINE       |
| group_replication_applier | 81143c99-8e3d-11e8-a501-000d3aa1b575 | mw-mysql-2  |        3306 | ONLINE       |
| group_replication_applier | 8bab920f-8e3d-11e8-a045-000d3aa09b70 | mw-mysql-3  |        3306 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+