将 kubernetes 从 1.23.5 升级至 1.24,container runtime 也从 docker 切换到 containerd ,但 kubelet 无法启动:
$ systemctl status kubelet
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Mon 2022-05-09 17:14:37 CST; 5s ago
Docs: https://kubernetes.io/docs/home/
Process: 17952 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 17952 (code=exited, status=1/FAILURE)
请问如何解决?
改为命令行手动启动 kubelet
/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf \
--config=/var/lib/kubelet/config.yaml \
--cgroup-driver=systemd \
--network-plugin=cni \
--pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.6 \
--resolv-conf=/run/systemd/resolve/resolv.conf \
--container-runtime=remote \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock
出现错误
Error: failed to parse kubelet flag: unknown flag: --network-plugin
添加下面2个参数
--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
消除下面的错误
server.go:618] "Failed to get the kubelet's cgroup. Kubelet system container metrics may be missing." err="cpu and memory cgroup hierarchy not unified. cpu: /user.slice, memory: /user.slice/user-0.slice/session-144.scope"
server.go:624] "Failed to get the container runtime's cgroup. Runtime system container metrics may be missing." err="cpu and memory cgroup hierarchy not unified. cpu: /user.slice, memory: /user.slice/user-0.slice/session-144.scope"
现在剩下的错误:
W0511 16:39:35.062526 356299 reflector.go:324] vendor/k8s.io/client-go/informers/factory.go:134: failed to list *v1.Node: Get "https://k8s-api:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube-master0&limit=500&resourceVersion=0": dial tcp 10.0.9.171:6443: connect: connection refused
E0511 16:39:35.062596 356299 reflector.go:138] vendor/k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://k8s-api:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube-master0&limit=500&resourceVersion=0": dial tcp 10.0.9.171:6443: connect: connection refused
E0511 16:39:35.062669 356299 remote_runtime.go:168] "Version from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
E0511 16:39:35.062707 356299 kuberuntime_manager.go:225] "Get runtime version failed" err="get remote runtime typed version failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
Error: failed to run Kubelet: failed to create kubelet: get remote runtime typed version failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService
W0511 16:39:35.063449 356299 reflector.go:324] vendor/k8s.io/client-go/informers/factory.go:134: failed to list *v1.Service: Get "https://k8s-api:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.0.9.171:6443: connect: connection refused
E0511 16:39:35.063518 356299 reflector.go:138] vendor/k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://k8s-api:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.0.9.171:6443: connect: connection refused
是 /etc/containerd/config.toml 的配置引起的,出现问题时的配置
disabled_plugins = ["cri"]
通过下面的命令重新生成默认配置后 kubelet 就正常启动了
containerd config default > /etc/containerd/config.toml
@dudu: node 服务器手动启动 kubelet 的命令
/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \
--cgroup-driver=systemd \
--pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.6 \
--resolv-conf=/run/systemd/resolve/resolv.conf \
--container-runtime=remote \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock