3 个 master 节点的 redis 集群处于 fail 状态,cluster info 输出如下:
cluster info
cluster_state:fail
cluster_slots_assigned:14060
cluster_slots_ok:14060
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3
cluster_size:3
cluster_current_epoch:9
cluster_my_epoch:9
cluster_stats_messages_ping_sent:1405
cluster_stats_messages_pong_sent:1480
cluster_stats_messages_fail_sent:2
cluster_stats_messages_sent:2887
cluster_stats_messages_ping_received:1480
cluster_stats_messages_pong_received:1403
cluster_stats_messages_received:2883
每个节点都处于下面的状态:
Ready to accept connections
请问如何恢复集群?
后来其中2个节点出现下面的日志
redis-cache-1
1:M 24 Jun 2022 00:49:49.695 * Ready to accept connections
1:M 24 Jun 2022 00:50:42.537 * FAIL message received from 0503e8066076b194a3fbe922ac1e94f454ac6b78 about 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4
1:M 24 Jun 2022 00:50:43.002 # Address updated for node 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4, now 192.168.11.227:6379
1:M 24 Jun 2022 00:51:12.903 * Clear FAIL state for node 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4: is reachable again and nobody is serving its slots after some time.
redis-cache-2
1:M 24 Jun 2022 00:45:03.228 * Ready to accept connections
1:M 24 Jun 2022 00:49:49.697 # Address updated for node 19d9caa5600555e8de83e33a5cfce0ef50c956e8, now 192.168.19.70:6379
1:M 24 Jun 2022 00:50:42.531 * Marking node 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4 as failing (quorum reached).
1:M 24 Jun 2022 00:50:43.002 # Address updated for node 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4, now 192.168.11.227:6379
1:M 24 Jun 2022 00:51:12.652 * Clear FAIL state for node 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4: is reachable again and nobody is serving its slots after some time.
对集群进行3次 fix 操作后终于解决了!
第1次 fix,从 redis-cache-0(192.168.11.227) 开始
redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.11.227:6379 --cluster-fix-with-unreachable-masters
第2次 fix,从 redis-cache-1(192.168.19.70) 开始
redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.19.70:6379 --cluster-fix-with-unreachable-masters
第3次 fix,从 redis-cache-2(192.168.12.128) 开始
redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.12.128:6379 --cluster-fix-with-unreachable-masters
恢复后,3个节点的日志都显示
Cluster state changed: ok
被标记为 failing 的节点是 redis-cache-0
– dudu 2年前第一次执行
– dudu 2年前redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.11.227:6379 --cluster-fix-with-unreachable-masters
(redis-cache-0的IP),问题依旧第二次执行
– dudu 2年前redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.19.70:6379 --cluster-fix-with-unreachable-masters
(redis-cache-1的IP),2个节点(redis-cache-0与redis-cache-2)恢复正常:Cluster state changed: ok第三次执行
– dudu 2年前redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.12.128:6379 --cluster-fix-with-unreachable-masters
(redis-cache-2的IP),3个节点全部恢复正常