2018年写了一套ansible一键安装kubernetes v1.11版本集群,后基于此在公司测试环境搭建了一套集群,目前版本比较低,计划升级,记录下升级过程
ansbile v1.11安装见GitHub
一、准备工作 1. 下载新版本文件 mkdir /root/v1.20 wget -P /root/v1.20/ https://storage.googleapis.com/kubernetes-release/release/v1.20.4/kubernetes-server-linux-amd64.tar.gz wget -P /root/v1.20/ https://storage.googleapis.com/kubernetes-release/release/v1.20.4/kubernetes-client-linux-amd64.tar.gz cd /root/v1.20tar zxvf kubernetes-server-linux-amd64.tar.gz tar zxvf kubernetes-client-linux-amd64.tar.gz
2. 备份 mkdir /root/v1.11 cp -a /usr/local /bin /root/v1.11/ cp -a /etc/systemd/system/kube* /root/v1.11/
二、升级 1. 升级kubectl cp -a /root/v1.20/kubernetes/client/bin/kubectl /usr/local /bin/
2. 升级master组件 apiserver systemctl stop kube-apiserver systemctl stop kube-controller-manager systemctl stop kube-scheduler cp -a /root/v1.20/kubernetes/server/bin/{kube-controller-manager,kube-scheduler,kubeadm} /usr/local /bin/ systemctl start kube-apiserver journalctl -fu kube-apiserver
可以查看到etcd中的数据说明kube-apiserver没有问题
[root@prod-public-runner-k8s-node02 ~] Warning: v1 ComponentStatus is deprecated in v1.19+ NAME STATUS MESSAGE ERROR scheduler Unhealthy Get "http://127.0.0.1:10251/healthz" : dial tcp 127.0.0.1:10251: connect: connection refused controller-manager Unhealthy Get "http://127.0.0.1:10252/healthz" : dial tcp 127.0.0.1:10252: connect: connection refused etcd-0 Healthy {"health" : "true" } etcd-1 Healthy {"health" : "true" } etcd-2 Healthy {"health" : "true" }
遇到的错误
错误1
logs : kube-apiserver: Error: invalid port value 8080: only zero is allowed
解决方法:/etc/systemd/system/kube-apiserver.service 中删除–insecure-port=8080
错误2
logs : Error: [service-account-issuer is a required flag, –service-account-signing-key-file and –service-account-issuer are required flags]
解决方法:/etc/systemd/system/kube-apiserver.service 中增加以下三行
–service-account-issuer=https://kubernetes.default.svc.cluster.local \
–service-account-key-file=/etc/kubernetes/ssl/ca-key.pem \
–service-account-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \
kube-controller-manager systemctl start kube-controller-manager
遇到的错误
错误1
kube-controller-manager: E0214 16:39:32.487623 31964 leaderelection.go:325] error retrieving resource lock kube-system/kube-controller-manager: Get “http://127.0.0.1:8080/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s" : dial tcp 127.0.0.1:8080: connect: connection refused
解决方法:/etc/systemd/system/kube-controller-manager.service 中–master=http://127.0.0.1:8080改为--master=https://127.0.0.1:6443 , 并且增加 –kubeconfig=/root/.kube/config
kube-scheduler systemctl start kube-scheduler
遇到的错误
错误1
kube-scheduler: E0214 16:45:43.294168 32693 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.ReplicaSet: failed to list *v1.ReplicaSet: Get “http://127.0.0.1:8080/apis/apps/v1/replicasets?limit=500&resourceVersion=0" : dial tcp 127.0.0.1:8080: connect: connection refused
解决方法:/etc/systemd/system/kube-controller-manager.service 中–master=http://127.0.0.1:8080改为--master=https://127.0.0.1:6443 , 并且增加 –kubeconfig=/root/.kube/config
查看启动状态,此时kubernetes集群已经恢复
[root@prod-public-runner-k8s-node02 ~] NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health" : "true" } etcd-1 Healthy {"health" : "true" } etcd-2 Healthy {"health" : "true" }
3. 升级node组件 kubelet/kube-proxy systemctl stop kubelet systemctl stop kube-proxy cp -a /root/v1.20/kubernetes/server/bin/{kubelet,kube-proxy} /usr/local /bin/ systemctl start kubelet journalctl -fu kubelet
kubelet 启动后稍等一会 node就会Ready,否则根据日志排错
[root@prod-public-runner-k8s-node01 system] NAME STATUS ROLES AGE VERSION prod-public-runner-k8s-node01 Ready <none> 4y107d v1.20.4 prod-public-runner-k8s-node02 Ready <none> 4y107d v1.20.4 prod-public-runner-k8s-node03 Ready <none> 4y107d v1.20.4
最后启动systemctl start kube-proxy
systemctl stop kube-proxy
三、结果验证 [root@prod-public-runner-k8s-node01 ~]# kubectl cluster-info Kubernetes control plane is running at https://127.0.0.1:6443 CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. [root@prod-public-runner-k8s-node01 ~]# kubectl get node NAME STATUS ROLES AGE VERSION prod-public-runner-k8s-node01 Ready <none> 4y107d v1.20.4 prod-public-runner-k8s-node02 Ready <none> 4y107d v1.20.4 prod-public-runner-k8s-node03 Ready <none> 4y107d v1.20.4 [root@prod-public-runner-k8s-node01 ~]# kubectl get cs Warning: v1 ComponentStatus is deprecated in v1.19+ NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health": "true"} etcd-1 Healthy {"health": "true"} etcd-2 Healthy {"health": "true"}
最后贴下所有配置文件 kube-apiserver.service [Unit] Description=Kubernetes API Server Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=network.target [Service] User=root ExecStart=/usr/local /bin/kube-apiserver \ --enable -admission-plugins=NamespaceLifecycle,NodeRestriction,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota \ --anonymous-auth=false \ --advertise-address=172.17.254.185 \ --allow-privileged=true \ --apiserver-count=3 \ --audit-policy-file=/etc/kubernetes/audit-policy.yaml \ --audit-log-maxage=30 \ --audit-log-maxbackup=3 \ --audit-log-maxsize=100 \ --audit-log-path=/var/log /kubernetes/audit.log \ --authorization-mode=Node,RBAC \ --bind -address=0.0.0.0 \ --secure-port=6443 \ --client-ca-file=/etc/kubernetes/ssl/ca.pem \ --kubelet-client-certificate=/etc/kubernetes/ssl/kubernetes.pem \ --kubelet-client-key=/etc/kubernetes/ssl/kubernetes-key.pem \ --etcd-cafile=/etc/kubernetes/ssl/ca.pem \ --etcd-certfile=/etc/kubernetes/ssl/etcd.pem \ --etcd-keyfile=/etc/kubernetes/ssl/etcd-key.pem \ --etcd-servers=https://172.17.254.186:2379,https://172.17.254.187:2379,https://172.17.254.185:2379 \ --event-ttl=1h \ --service-cluster-ip-range=10.254.0.0/18 \ --service-node-port-range=30000-32000 \ --tls-cert-file=/etc/kubernetes/ssl/kubernetes.pem \ --tls-private-key-file=/etc/kubernetes/ssl/kubernetes-key.pem \ --enable -bootstrap-token-auth=true \ --log -dir=/var/log /kubernetes \ --v=1 \ --service-account-issuer=https://kubernetes.default.svc.cluster.local \ --service-account-key-file=/etc/kubernetes/ssl/ca-key.pem \ --service-account-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \ --experimental-encryption-provider-config=/etc/kubernetes/encryption.yaml \ --feature-gates=RemoveSelfLink=false Restart=on-failure RestartSec=5 Type=notify LimitNOFILE=65536 [Install] WantedBy=multi-user.target
kube-controller-manager.service [Unit] Description=Kubernetes Controller Manager Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] ExecStart=/usr/local /bin/kube-controller-manager \ --allocate-node-cidrs=true \ --master=https://127.0.0.1:6443 \ --service-cluster-ip-range=10.254.0.0/18 \ --cluster-cidr=10.254.64.0/18 \ --cluster-signing-cert-file=/etc/kubernetes/ssl/ca.pem \ --cluster-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \ --controllers=*,tokencleaner,bootstrapsigner \ --cluster-name=kubernetes \ --service-account-private-key-file=/etc/kubernetes/ssl/ca-key.pem \ --root-ca-file=/etc/kubernetes/ssl/ca.pem \ --leader-elect=true \ --v=2 \ --use-service-account-credentials=true \ --kubeconfig=/root/.kube/config Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target
kube-scheduler.service [Unit] Description=Kubernetes Scheduler Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] ExecStart=/usr/local /bin/kube-scheduler \ --bind -address=0.0.0.0 \ --leader-elect \ --v=2 \ --kubeconfig=/root/.kube/config Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target
kubelet.service [Unit] Description=Kubernetes Kubelet Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=docker.service Requires=docker.service [Service] WorkingDirectory=/var/lib/kubelet ExecStart=/usr/local /bin/kubelet \ --hostname-override=prod-public-runner-k8s-node01 \ --pod-infra-container-image=registry.cn-beijing.aliyuncs.com/roobo/pause-amd64:3.1 \ --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \ --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \ --config=/etc/kubernetes/kubelet.config.json \ --cert-dir=/etc/kubernetes/ssl \ --logtostderr=true \ --v=2 \ --allowed-unsafe-sysctls=net.* [Install] WantedBy=multi-user.target
kube-proxy.service [Unit] Description=Kubernetes Kube-Proxy Server Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=network.target [Service] WorkingDirectory=/var/lib/kube-proxy ExecStart=/usr/local /bin/kube-proxy \ --config=/etc/kubernetes/kube-proxy.config.yaml \ --logtostderr=true \ --v=1 Restart=on-failure RestartSec=5 LimitNOFILE=65536 [Install] WantedBy=multi-user.target