使用的是官方的 Altinity Kubernetes Operator for ClickHouse,之前通过 operator 部署了 chi-clickhouse-db-analytics 集群,对应的 statefulset 名称是 chi-clickhouse-db-analytics-0-0
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: clickhouse-db
namespace: clickhouse
spec:
configuration:
clusters:
- name: "analytics"
layout:
shardsCount: 1
replicasCount: 1
# ...
最近这个 clickhouse 集群出现了 pod 无法启动的问题,详见 https://q.cnblogs.com/q/154234
昨天准备重新部署 clickhouse,删除了已有的 chi-clickhouse-db-analytics
kubectl delete -f clickhouse-analytics.yaml -n clickhouse
部署之前更新了一下 operator,先更新清单文件 clickhouse-operator-install-template.yaml,然后执行下面的脚本进行更新
#!/usr/bin/env bash
export OPERATOR_NAMESPACE="clickhouse"
export METRICS_EXPORTER_NAMESPACE="${OPERATOR_NAMESPACE}"
export OPERATOR_IMAGE="${OPERATOR_IMAGE:-altinity/clickhouse-operator:latest}"
export METRICS_EXPORTER_IMAGE="${METRICS_EXPORTER_IMAGE:-altinity/metrics-exporter:latest}"
declare filename="clickhouse-operator-install-template.yaml"
envsubst < $filename | kubectl delete --namespace="${OPERATOR_NAMESPACE}" -f -
operator 重新部署好之后,诡异的事情出现了,operator 竟然自动部署了下面的 statefulset
NAME READY AGE CONTAINERS
chi-db-clickhouse-analytics-0-0 1/1 24m clickhouse,clickhouse-log
需要注意的是,这里是 chi-db-clickhouse-analytics
,不是之前删除的 chi-clickhouse-db-analytics
,一个是 db-clickhouse
,一个是 clickhouse-db
印象中这个 chi-db-clickhouse-analytics
是两年多刚开始部署时用的名称,当时就已经删除了对应的部署,难道 kubernetes 集群中还留有这个部署的清单数据?干脆把 operator 也删除并重新部署
于是,删除 operator,并且通过下面的命令确认 clickhouse 命名空间已没有资源
kubectl get all -n clickhouse
注:当时忘了,get all 不会列出 ConfigMap 资源
重新部署 operator 后问题依旧,chi-db-clickhouse-analytics
又出现了
~# kubectl get pods -n clickhouse
NAME READY STATUS RESTARTS AGE
chi-db-clickhouse-analytics-0-0-0 2/2 Running 0 9m2s
clickhouse-operator-7cc4bc47fb-dpwtf 2/2 Running 0 9m36s
太奇怪了,operator 是从哪里知道这2个很久之前就已经删除的清单数据的?
1)metadata.name
metadata:
name: db-clickhouse
2)spec.clusters.name
clusters:
- name: "analytics"
chi-db-clickhouse-analytics
名称就是来源于上面的2个配置
注:当时忘了,get all 不会列出 ConfigMap 资源
漏掉的这个环节,就是问题的原因,ConfigMap 中果然还遗留 db-clickhouse 相关的配置
# kubectl get cm -n clickhouse
NAME DATA AGE
chi-db-clickhouse-common-configd 7 2y159d
chi-db-clickhouse-deploy-confd-analytics-0-0 2 2y110d
答案就在 chi-db-clickhouse-deploy-confd-analytics-0-0 中
kind: ConfigMap
metadata:
creationTimestamp: "2023-05-07T02:33:48Z"
labels:
clickhouse.altinity.com/ConfigMap: Host
clickhouse.altinity.com/app: chop
clickhouse.altinity.com/chi: db-clickhouse
clickhouse.altinity.com/cluster: analytics
clickhouse.altinity.com/namespace: clickhouse
clickhouse.altinity.com/object-version: 0dabdf468952c2f851380c2aad546d19e6bf9f63
clickhouse.altinity.com/replica: "0"
clickhouse.altinity.com/shard: "0"
删除这个 ConfigMap,问题就解决了
kubectl delete cm chi-db-clickhouse-deploy-confd-analytics-0-0 -n clickhouse