将 kubernetes 从 1.23.5 升级至 1.24,container runtime 也从 docker 切换到 containerd,kubelet 已正常启动,但集群无法启动
$ kubectl get nodes
The connection to the server k8s-api:6443 was refused - did you specify the right host or port?
部分错误日志如下
May 11 17:40:07 kube-master0 kubelet[2333]: E0511 17:40:07.077569 2333 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://k8s-api:6443/api/v1/nodes\": dial tcp 10>
May 11 17:40:07 kube-master0 kubelet[2333]: E0511 17:40:07.164180 2333 kubelet.go:2419] "Error getting node" err="node \"kube-master0\" not found"
May 11 17:40:07 kube-master0 kubelet[2333]: E0511 17:40:07.264859 2333 kubelet.go:2419] "Error getting node" err="node \"kube-master0\" not found"
May 11 17:40:07 kube-master0 kubelet[2333]: E0511 17:40:07.316406 2333 eviction_manager.go:254] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"kube-master0\" not >
May 11 17:40:07 kube-master0 kubelet[2333]: E0511 17:40:07.364905 2333 kubelet.go:2419] "Error getting node" err="node \"kube-master0\" not found"
May 11 17:40:07 kube-master0 kubelet[2333]: E0511 17:40:07.465768 2333 kubelet.go:2419] "Error getting node" err="node \"kube-master0\" not found"
通过 How to smoothly switch the kubernetes container runtime from docker to container 找到了解决方法,问题是无法访问 k8s.gcr.io 引起的。
打开 containerd 的配置文件 /etc/containerd/config.toml
,将 [plugins."io.containerd.grpc.v1.cri"]
-> sandbox_image
的值修改为 registry.aliyuncs.com/google_containers/pause:3.6
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
...
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"