首页 新闻 会员 周边 捐助

如何恢复处于 fail 状态的 redis 集群

0
悬赏园豆:30 [已解决问题] 解决于 2022-06-24 10:56

3 个 master 节点的 redis 集群处于 fail 状态,cluster info 输出如下:

cluster info
cluster_state:fail
cluster_slots_assigned:14060
cluster_slots_ok:14060
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3
cluster_size:3
cluster_current_epoch:9
cluster_my_epoch:9
cluster_stats_messages_ping_sent:1405
cluster_stats_messages_pong_sent:1480
cluster_stats_messages_fail_sent:2
cluster_stats_messages_sent:2887
cluster_stats_messages_ping_received:1480
cluster_stats_messages_pong_received:1403
cluster_stats_messages_received:2883

每个节点都处于下面的状态:

Ready to accept connections

请问如何恢复集群?

问题补充:

后来其中2个节点出现下面的日志

redis-cache-1

1:M 24 Jun 2022 00:49:49.695 * Ready to accept connections
1:M 24 Jun 2022 00:50:42.537 * FAIL message received from 0503e8066076b194a3fbe922ac1e94f454ac6b78 about 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4
1:M 24 Jun 2022 00:50:43.002 # Address updated for node 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4, now 192.168.11.227:6379
1:M 24 Jun 2022 00:51:12.903 * Clear FAIL state for node 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4: is reachable again and nobody is serving its slots after some time.

redis-cache-2

1:M 24 Jun 2022 00:45:03.228 * Ready to accept connections
1:M 24 Jun 2022 00:49:49.697 # Address updated for node 19d9caa5600555e8de83e33a5cfce0ef50c956e8, now 192.168.19.70:6379
1:M 24 Jun 2022 00:50:42.531 * Marking node 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4 as failing (quorum reached).
1:M 24 Jun 2022 00:50:43.002 # Address updated for node 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4, now 192.168.11.227:6379
1:M 24 Jun 2022 00:51:12.652 * Clear FAIL state for node 6caf3c5483eabf9ee6aed6c3cbb240183eab7cc4: is reachable again and nobody is serving its slots after some time.
dudu的主页 dudu | 高人七级 | 园豆:29789
提问于:2022-06-24 09:18

被标记为 failing 的节点是 redis-cache-0

dudu 2年前

第一次执行 redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.11.227:6379 --cluster-fix-with-unreachable-masters (redis-cache-0的IP),问题依旧

dudu 2年前

第二次执行 redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.19.70:6379 --cluster-fix-with-unreachable-masters (redis-cache-1的IP),2个节点(redis-cache-0与redis-cache-2)恢复正常:Cluster state changed: ok

dudu 2年前

第三次执行 redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.12.128:6379 --cluster-fix-with-unreachable-masters (redis-cache-2的IP),3个节点全部恢复正常

dudu 2年前
< >
分享
最佳答案
0

对集群进行3次 fix 操作后终于解决了!

第1次 fix,从 redis-cache-0(192.168.11.227) 开始

redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.11.227:6379 --cluster-fix-with-unreachable-masters

第2次 fix,从 redis-cache-1(192.168.19.70) 开始

redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.19.70:6379 --cluster-fix-with-unreachable-masters

第3次 fix,从 redis-cache-2(192.168.12.128) 开始

redis-cli -a $REDIS_PASSWORD --cluster fix 192.168.12.128:6379 --cluster-fix-with-unreachable-masters

恢复后,3个节点的日志都显示

Cluster state changed: ok
dudu | 高人七级 |园豆:29789 | 2022-06-24 10:54
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册