用 kubeadm init 命令创建集群时出现下面的错误,请问如何解决?
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
journalctl -u kubelet 查看到的错误日志
Dec 17 07:23:06 k8s-master0 kubelet[8677]: E1217 07:23:06.438404 8677 kubelet.go:2267] node "k8s-master0" not found
Dec 17 07:23:08 k8s-master0 kubelet[8677]: W1217 07:23:08.920952 8677 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
Dec 17 07:23:09 k8s-master0 kubelet[8677]: E1217 07:23:09.668733 8677 kubelet.go:2187] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugi
github 中的相关 issue:Using kubeadm to init kubernetes 1.12.0 falied:node “xxx” not found
创建单点 master 集群没有这个问题
kubeadm init \
--kubernetes-version v1.16.3 \
--image-repository registry.aliyuncs.com/google_containers \
--pod-network-cidr=10.244.0.0/16
问题是出现在创建高可用集群时
kubeadm init \
--kubernetes-version v1.16.3 \
--control-plane-endpoint "k8s.cnblogs.com:6443" --upload-certs \
--image-repository registry.aliyuncs.com/google_containers \
--pod-network-cidr=10.244.0.0/16
日志是程序员的雷达,通过 kubeadm 命令的 --v=6
参数开启更灵敏的雷达找到了问题的线索
[kubelet-check] Initial timeout of 40s passed.
I1217 08:39:21.852678 20972 round_trippers.go:443] GET https://k8s.cnblogs.com:6443/healthz?timeout=32s in 30000 milliseconds
是健康检查时连接 control-plane-endpoint 地址超时了,k8s.cnblogs.com 用的是阿里云负载均衡(tcp 转发)。
发现阿里云负载均衡 tcp 转发的一个问题,如果发请求的服务器与负载均衡的后端服务是同一台服务器,则无法通信
就是阿里云负载均衡的原因,在 hosts 中将 k8s.cnblogs.com 解析到 master 节点的本机 IP 地址,不走阿里云负载均衡后,问题就解决了。