准备将 k8s 从 1.17.0 升级至 1.18.0,运行下面的命令
kubeadm upgrade plan --ignore-preflight-errors=CoreDNSUnsupportedPlugins
在"Running cluster health checks"时报错
[upgrade] Running cluster health checks
error syncing endpoints with etc: context deadline exceeded
请问如何解决?
通过下面的命令发现是 etcd 容器没启动起来
docker ps | grep etcd
修复 /etc/kubernetes/manifests/etcd.yaml 中的错误配置,etcd 容器成功启动,但问题依旧
[upgrade] Running cluster health checks
I0115 18:49:48.762279 9793 health.go:158] Creating Job "upgrade-health-check" in the namespace "kube-system"
I0115 18:49:48.792884 9793 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0115 18:49:49.794837 9793 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0115 18:49:50.794333 9793 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0115 18:49:51.829966 9793 health.go:195] Job "upgrade-health-check" in the namespace "kube-system" completed
I0115 18:49:51.830007 9793 health.go:201] Deleting Job "upgrade-health-check" in the namespace "kube-system"
I0115 18:49:51.839520 9793 etcd.go:178] retrieving etcd endpoints from "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation in etcd Pods
I0115 18:49:51.847027 9793 etcd.go:192] etcd Pod "etcd-k8s-master0" is missing the "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation; cannot infer etcd advertise client URL using the Pod annotation
I0115 18:49:51.847136 9793 etcd.go:202] retrieving etcd endpoints from the cluster status
I0115 18:49:51.849472 9793 etcd.go:102] etcd endpoints read from pods: https://10.0.9.171:2379
context deadline exceeded
error syncing endpoints with etc
在 master 上安装 etcdctl 命令
wget -c https://github.com/etcd-io/etcd/releases/download/v3.4.14/etcd-v3.4.14-linux-amd64.tar.gz
mv etcdctl /usr/bin/etcdctl
然后用 etcdctl 命令连接 etcd
etcdctl --endpoints 10.0.9.171:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
从返回的错误找到了问题的原因
{"level":"warn","ts":"2021-01-16T08:24:50.026+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-efd79c04-7e43-492b-bbd5-defe5b400e68/10.0.9.171:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate is valid for 10.0.1.81, 127.0.0.1, ::1, not 10.0.9.171\""}
原来是证书问题
重新生成 etcd-server 证书后问题解决
cd /etc/kubernetes/pki/etcd
rm server.crt server.key
kubeadm init phase certs etcd-server