2018年写了一套ansible一键安装kubernetes v1.11版本集群,后基于此在公司测试环境搭建了一套集群,目前版本比较低,计划升级,记录下升级过程

ansbile v1.11安装见GitHub

一、准备工作

1. 下载新版本文件

mkdir /root/v1.20
wget -P /root/v1.20/ https://storage.googleapis.com/kubernetes-release/release/v1.20.4/kubernetes-server-linux-amd64.tar.gz
wget -P /root/v1.20/ https://storage.googleapis.com/kubernetes-release/release/v1.20.4/kubernetes-client-linux-amd64.tar.gz
cd /root/v1.20
tar zxvf kubernetes-server-linux-amd64.tar.gz
tar zxvf kubernetes-client-linux-amd64.tar.gz

2. 备份

mkdir /root/v1.11
cp -a /usr/local/bin /root/v1.11/
cp -a /etc/systemd/system/kube* /root/v1.11/

二、升级

1. 升级kubectl

cp -a /root/v1.20/kubernetes/client/bin/kubectl /usr/local/bin/

2. 升级master组件

apiserver
systemctl stop kube-apiserver
systemctl stop kube-controller-manager
systemctl stop kube-scheduler

cp -a /root/v1.20/kubernetes/server/bin/{kube-controller-manager,kube-scheduler,kubeadm} /usr/local/bin/
systemctl start kube-apiserver
journalctl -fu kube-apiserver

可以查看到etcd中的数据说明kube-apiserver没有问题

[root@prod-public-runner-k8s-node02 ~]# kubectl  get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Unhealthy Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused
etcd-0 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
etcd-2 Healthy {"health": "true"}

遇到的错误

错误1

logs : kube-apiserver: Error: invalid port value 8080: only zero is allowed

解决方法:/etc/systemd/system/kube-apiserver.service 中删除–insecure-port=8080

错误2

logs : Error: [service-account-issuer is a required flag, –service-account-signing-key-file and –service-account-issuer are required flags]

解决方法:/etc/systemd/system/kube-apiserver.service 中增加以下三行

–service-account-issuer=https://kubernetes.default.svc.cluster.local \

–service-account-key-file=/etc/kubernetes/ssl/ca-key.pem \

–service-account-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \

kube-controller-manager
systemctl start kube-controller-manager

遇到的错误

错误1

kube-controller-manager: E0214 16:39:32.487623 31964 leaderelection.go:325] error retrieving resource lock kube-system/kube-controller-manager: Get “http://127.0.0.1:8080/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s": dial tcp 127.0.0.1:8080: connect: connection refused

解决方法:/etc/systemd/system/kube-controller-manager.service 中–master=http://127.0.0.1:8080改为--master=https://127.0.0.1:6443 , 并且增加 –kubeconfig=/root/.kube/config

kube-scheduler
systemctl start kube-scheduler

遇到的错误

错误1

kube-scheduler: E0214 16:45:43.294168 32693 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.ReplicaSet: failed to list *v1.ReplicaSet: Get “http://127.0.0.1:8080/apis/apps/v1/replicasets?limit=500&resourceVersion=0": dial tcp 127.0.0.1:8080: connect: connection refused

解决方法:/etc/systemd/system/kube-controller-manager.service 中–master=http://127.0.0.1:8080改为--master=https://127.0.0.1:6443 , 并且增加 –kubeconfig=/root/.kube/config

查看启动状态,此时kubernetes集群已经恢复

[root@prod-public-runner-k8s-node02 ~]# kubectl  get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
etcd-2 Healthy {"health": "true"}

3. 升级node组件

kubelet/kube-proxy
systemctl stop kubelet
systemctl stop kube-proxy

cp -a /root/v1.20/kubernetes/server/bin/{kubelet,kube-proxy} /usr/local/bin/
systemctl start kubelet
journalctl -fu kubelet

kubelet 启动后稍等一会 node就会Ready,否则根据日志排错

[root@prod-public-runner-k8s-node01 system]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
prod-public-runner-k8s-node01 Ready <none> 4y107d v1.20.4
prod-public-runner-k8s-node02 Ready <none> 4y107d v1.20.4
prod-public-runner-k8s-node03 Ready <none> 4y107d v1.20.4

最后启动systemctl start kube-proxy

systemctl stop kube-proxy

三、结果验证

[root@prod-public-runner-k8s-node01 ~]#  kubectl cluster-info 
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root@prod-public-runner-k8s-node01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
prod-public-runner-k8s-node01 Ready <none> 4y107d v1.20.4
prod-public-runner-k8s-node02 Ready <none> 4y107d v1.20.4
prod-public-runner-k8s-node03 Ready <none> 4y107d v1.20.4
[root@prod-public-runner-k8s-node01 ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
etcd-2 Healthy {"health": "true"}

最后贴下所有配置文件

kube-apiserver.service

[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
User=root
ExecStart=/usr/local/bin/kube-apiserver \
--enable-admission-plugins=NamespaceLifecycle,NodeRestriction,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota \
--anonymous-auth=false \
--advertise-address=172.17.254.185 \
--allow-privileged=true \
--apiserver-count=3 \
--audit-policy-file=/etc/kubernetes/audit-policy.yaml \
--audit-log-maxage=30 \
--audit-log-maxbackup=3 \
--audit-log-maxsize=100 \
--audit-log-path=/var/log/kubernetes/audit.log \
--authorization-mode=Node,RBAC \
--bind-address=0.0.0.0 \
--secure-port=6443 \
--client-ca-file=/etc/kubernetes/ssl/ca.pem \
--kubelet-client-certificate=/etc/kubernetes/ssl/kubernetes.pem \
--kubelet-client-key=/etc/kubernetes/ssl/kubernetes-key.pem \
--etcd-cafile=/etc/kubernetes/ssl/ca.pem \
--etcd-certfile=/etc/kubernetes/ssl/etcd.pem \
--etcd-keyfile=/etc/kubernetes/ssl/etcd-key.pem \
--etcd-servers=https://172.17.254.186:2379,https://172.17.254.187:2379,https://172.17.254.185:2379 \
--event-ttl=1h \
--service-cluster-ip-range=10.254.0.0/18 \
--service-node-port-range=30000-32000 \
--tls-cert-file=/etc/kubernetes/ssl/kubernetes.pem \
--tls-private-key-file=/etc/kubernetes/ssl/kubernetes-key.pem \
--enable-bootstrap-token-auth=true \
--log-dir=/var/log/kubernetes \
--v=1 \
--service-account-issuer=https://kubernetes.default.svc.cluster.local \
--service-account-key-file=/etc/kubernetes/ssl/ca-key.pem \
--service-account-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \
--experimental-encryption-provider-config=/etc/kubernetes/encryption.yaml \
--feature-gates=RemoveSelfLink=false
Restart=on-failure
RestartSec=5
Type=notify
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

kube-controller-manager.service

[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-controller-manager \
--allocate-node-cidrs=true \
--master=https://127.0.0.1:6443 \
--service-cluster-ip-range=10.254.0.0/18 \
--cluster-cidr=10.254.64.0/18 \
--cluster-signing-cert-file=/etc/kubernetes/ssl/ca.pem \
--cluster-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \
--controllers=*,tokencleaner,bootstrapsigner \
--cluster-name=kubernetes \
--service-account-private-key-file=/etc/kubernetes/ssl/ca-key.pem \
--root-ca-file=/etc/kubernetes/ssl/ca.pem \
--leader-elect=true \
--v=2 \
--use-service-account-credentials=true \
--kubeconfig=/root/.kube/config
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

kube-scheduler.service

[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-scheduler \
--bind-address=0.0.0.0 \
--leader-elect \
--v=2 \
--kubeconfig=/root/.kube/config
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

kubelet.service

[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/usr/local/bin/kubelet \
--hostname-override=prod-public-runner-k8s-node01 \
--pod-infra-container-image=registry.cn-beijing.aliyuncs.com/roobo/pause-amd64:3.1 \
--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig \
--config=/etc/kubernetes/kubelet.config.json \
--cert-dir=/etc/kubernetes/ssl \
--logtostderr=true \
--v=2 \
--allowed-unsafe-sysctls=net.*

[Install]
WantedBy=multi-user.target

kube-proxy.service

[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
WorkingDirectory=/var/lib/kube-proxy
ExecStart=/usr/local/bin/kube-proxy \
--config=/etc/kubernetes/kube-proxy.config.yaml \
--logtostderr=true \
--v=1
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target