Kubernetes Monitoring
- Kubernetes monitoring is offered by the integrated Metrics Server
- The server, after installation, exposes a standard API and can be used to
expose custom metrics - Use
kubectl top
to see a top-like interface to provide resource usage
Setting up Metrics Server
- See
https://github.com/kubernetes-sigs/metrics-server.git
- Read github documentation!
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl —n kube-system get pods
# look for metrics-serverkubectl —n kube-system edit deployment metrics-server
- In
spec.template.spec.containers.args
, use the following- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternallP,ExternallP, Hostname
- In
kubectl —n kube-system logs metrics-server<TAB>
should show “Generating self-signed cert” and “Serving securely on [::]443kubectl top pods --all-namespaces
will show most active Pods
Let’s investigate the metric server.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
[root@k8s manifests]# kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml serviceaccount/metrics-server created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrole.rbac.authorization.k8s.io/system:metrics-server created rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created service/metrics-server created deployment.apps/metrics-server created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created [root@k8s manifests]# kubectl -n kube-system get pods NAME READY STATUS RESTARTS AGE coredns-5dd5756b68-sgfkj 0/1 CrashLoopBackOff 799 (67s ago) 4d1h etcd-k8s.example.pl 1/1 Running 1 (2d20h ago) 4d1h kube-apiserver-k8s.example.pl 1/1 Running 5 (18h ago) 4d1h kube-controller-manager-k8s.example.pl 1/1 Running 3 (2d20h ago) 4d1h kube-proxy-5nmms 1/1 Running 1 (2d20h ago) 4d1h kube-scheduler-k8s.example.pl 1/1 Running 1 (2d20h ago) 4d1h metrics-server-6db4d75b97-z54v6 0/1 Running 0 60s storage-provisioner 1/1 Running 0 4d1h [root@k8s manifests]# kubectl logs -n kube-system metrics-server-6db4d75b97-z54v6 I0204 16:24:12.739473 1 serving.go:374] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key) I0204 16:24:13.292873 1 handler.go:275] Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager I0204 16:24:13.403268 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I0204 16:24:13.403309 1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController I0204 16:24:13.403390 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file" I0204 16:24:13.403423 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0204 16:24:13.403458 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" I0204 16:24:13.403476 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0204 16:24:13.404129 1 secure_serving.go:213] Serving securely on [::]:10250 I0204 16:24:13.404193 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key" I0204 16:24:13.404331 1 tlsconfig.go:240] "Starting DynamicServingCertificateController" I0204 16:24:13.503728 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0204 16:24:13.503792 1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController I0204 16:24:13.503799 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0204 16:26:56.309835 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" I0204 16:26:59.392609 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0204 16:26:59.424509 1 scraper.go:149] "Failed to scrape node" err="Get \"https://172.30.9.24:10250/metrics/resource\": dial tcp 172.30.9.24:10250: connect: no route to host" node="k8s.example.pl" I0204 16:27:06.309637 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0204 16:27:14.400536 1 scraper.go:149] "Failed to scrape node" err="Get \"https://172.30.9.24:10250/metrics/resource\": dial tcp 172.30.9.24:10250: connect: no route to host" node="k8s.example.pl" I0204 16:27:16.311738 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" I0204 16:27:26.309031 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0204 16:27:29.440485 1 scraper.go:149] "Failed to scrape node" err="Get \"https://172.30.9.24:10250/metrics/resource\": dial tcp 172.30.9.24:10250: connect: no route to host" node="k8s.example.pl" I0204 16:27:36.311114 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0204 16:27:44.417699 1 scraper.go:149] "Failed to scrape node" err="Get \"https://172.30.9.24:10250/metrics/resource\": dial tcp 172.30.9.24:10250: connect: no route to host" node="k8s.example.pl" I0204 16:27:46.309503 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" I0204 16:27:56.309958 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0204 16:27:59.456455 1 scraper.go:149] "Failed to scrape node" err="Get \"https://172.30.9.24:10250/metrics/resource\": dial tcp 172.30.9.24:10250: connect: no route to host" node="k8s.example.pl" [root@k8s manifests]# [root@k8s manifests]# kubectl -n kube-system edit deployments.apps metrics-server deployment.apps/metrics-server edited [root@k8s manifests]# kubectl -n kube-system edit deployments.apps metrics-server -o yaml |
There is an issue int th metric server and we have to edit deployment metric server. One line has been added:
1 2 3 4 5 6 7 8 9 |
spec: containers: - args: - --cert-dir=/tmp - --secure-port=10250 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-insecure-tls - --kubelet-use-node-status-port - --metric-resolution=15s |
And now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
[root@k8s manifests]# kubectl -n kube-system get pods NAME READY STATUS RESTARTS AGE coredns-5dd5756b68-sgfkj 0/1 CrashLoopBackOff 803 (42s ago) 4d1h etcd-k8s.example.pl 1/1 Running 1 (2d20h ago) 4d1h kube-apiserver-k8s.example.pl 1/1 Running 5 (18h ago) 4d1h kube-controller-manager-k8s.example.pl 1/1 Running 3 (2d20h ago) 4d1h kube-proxy-5nmms 1/1 Running 1 (2d20h ago) 4d1h kube-scheduler-k8s.example.pl 1/1 Running 1 (2d20h ago) 4d1h metrics-server-5f8988d664-7r8j7 0/1 Running 0 12m metrics-server-6db4d75b97-z54v6 0/1 Running 0 21m storage-provisioner 1/1 Running 0 4d1h [root@k8s manifests]# kubectl logs -n kube-system metrics-server-5f8988d664-7r8j7 I0204 16:33:15.992406 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0204 16:33:19.328452 1 scraper.go:149] "Failed to scrape node" err="Get \"https://172.30.9.24:10250/metrics/resource\": dial tcp 172.30.9.24:10250: connect: no route to host" node="k8s.example.pl" I0204 16:33:25.991109 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0204 16:33:34.368456 1 scraper.go:149] "Failed to scrape node" err="Get \"https://172.30.9.24:10250/metrics/resource\": dial tcp 172.30.9.24:10250: connect: no route to host" node="k8s.example.pl" I0204 16:33:35.991518 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" I0204 16:33:45.989133 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0204 16:33:49.344463 1 scraper.go:149] "Failed to scrape node" err="Get \"https://172.30.9.24:10250/metrics/resource\": dial tcp 172.30.9.24:10250: connect: no route to host" node="k8s.example.pl" ... [root@k8s manifests]# firewall-cmd --permanent --add-port=10250/tcp success [root@k8s manifests]# firewall-cmd --reload success [root@k8s manifests]# kubectl -n kube-system get pods NAME READY STATUS RESTARTS AGE coredns-5dd5756b68-sgfkj 0/1 CrashLoopBackOff 803 (2m28s ago) 4d1h etcd-k8s.example.pl 1/1 Running 1 (2d20h ago) 4d1h kube-apiserver-k8s.example.pl 1/1 Running 5 (18h ago) 4d1h kube-controller-manager-k8s.example.pl 1/1 Running 3 (2d20h ago) 4d1h kube-proxy-5nmms 1/1 Running 1 (2d20h ago) 4d1h kube-scheduler-k8s.example.pl 1/1 Running 1 (2d20h ago) 4d1h metrics-server-5f8988d664-7r8j7 1/1 Running 0 14m storage-provisioner 1/1 Running 0 4d1h [root@k8s manifests]# kubectl top pods NAME CPU(cores) MEMORY(bytes) apples-78656fd5db-4rpj7 0m 7Mi apples-78656fd5db-qsm4x 0m 7Mi apples-78656fd5db-t82tg 0m 7Mi deploydaemon-zzllp 0m 7Mi firstnginx-d8679d567-249g9 0m 7Mi firstnginx-d8679d567-66c4s 0m 7Mi firstnginx-d8679d567-72qbd 0m 7Mi firstnginx-d8679d567-rhhlz 0m 7Mi init-demo 0m 7Mi lab4-pod 0m 7Mi morevol 0m 0Mi mydaemon-d4dcd 0m 7Mi mystaticpod-k8s.example.pl 0m 7Mi newdep-749c9b5675-2x9mb 0m 2Mi nginxsvc-5f8b7d4f4d-dtrs7 0m 7Mi pv-pod 0m 7Mi sleepy 0m 0Mi testpod 0m 7Mi two-containers 0m 7Mi web-0 1m 2Mi web-1 1m 2Mi web-2 1m 2Mi webserver-76d44586d-8gqhf 0m 7Mi webshop-7f9fd49d4c-92nj2 0m 7Mi webshop-7f9fd49d4c-kqllw 0m 7Mi webshop-7f9fd49d4c-x2czc 0m 7Mi [root@k8s manifests]# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% k8s.example.pl 288m 3% 3330Mi 21% |
Etcd
- The etcd is a core Kubernetes service that contains all resources that have
been created - It is started by the kubelet as a static Pod on the control node
- Losing the etcd means losing all your configuration
Etcd Backup
- To back up the etcd, root access is required to run the
etcdctl
tool - Use
sudo apt install etcd-client
to install this tool etcdctl
uses the wrong API version by default, fix this by using sudoETCDCTL_API=3 etcdctl ... snapshot save
- to use
etcdctl
, you need to specify the etcd service API endpoint, as well as cacert, cert and key to be used - Values for all of these can be obtained by using
ps aux | grep etcd
Backing up the Etcd
sudo apt install etcd-client
sudo etcdctl --help; sudo ETCDCTL_API=3 etcdctl --help
ps aux | grep etcd
sudo ETCDCTL_API=3 etcdctl --endpoints=localhost:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key get / --prefix --keys-only
sudo ETCDCTL_API=3 etcdctl --endpoints=localhost:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key snapshot save /tmp/etcdbackup.db
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
[root@k8s ~]# ETCD_RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest|grep tag_name | cut -d '"' -f 4) [root@k8s ~]# echo $ETCD_RELEASE v3.5.12 [root@k8s ~]# wget https://github.com/etcd-io/etcd/releases/download/${ETCD_RELEASE}/etcd-${ETCD_RELEASE}-linux-amd64.tar.gz --2024-02-04 12:51:29-- https://github.com/etcd-io/etcd/releases/download/v3.5.12/etcd-v3.5.12-linux-amd64.tar.gz Translacja github.com (github.com)... 140.82.121.3 Łączenie się z github.com (github.com)|140.82.121.3|:443... połączono. Żądanie HTTP wysłano, oczekiwanie na odpowiedź... 302 Found Lokalizacja: https://objects.githubusercontent.com/github-production-release-asset-2e65be/11225014/f198beb0-cda9-4776-bc21-3ee9ce967646?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240204% 2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240204T175129Z&X-Amz-Expires=300&X-Amz-Signature=58301bf185577765f3b913e6cb7647a1ec517a7cb6076d6c390bb28659a4a0e0&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=112250 14&response-content-disposition=attachment%3B%20filename%3Detcd-v3.5.12-linux-amd64.tar.gz&response-content-type=application%2Foctet-stream [podążanie] --2024-02-04 12:51:29-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/11225014/f198beb0-cda9-4776-bc21-3ee9ce967646?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA %2F20240204%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240204T175129Z&X-Amz-Expires=300&X-Amz-Signature=58301bf185577765f3b913e6cb7647a1ec517a7cb6076d6c390bb28659a4a0e0&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&re po_id=11225014&response-content-disposition=attachment%3B%20filename%3Detcd-v3.5.12-linux-amd64.tar.gz&response-content-type=application%2Foctet-stream Translacja objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ... Łączenie się z objects.githubusercontent.com (objects.githubusercontent.com)|185.199.111.133|:443... połączono. Żądanie HTTP wysłano, oczekiwanie na odpowiedź... 200 OK Długość: 20337842 (19M) [application/octet-stream] Zapis do: `etcd-v3.5.12-linux-amd64.tar.gz' etcd-v3.5.12-linux-amd64.tar.gz 100%[==========================================================================================================================>] 19,40M 3,28MB/s w 13s 2024-02-04 12:51:43 (1,51 MB/s) - zapisano `etcd-v3.5.12-linux-amd64.tar.gz' [20337842/20337842] [root@k8s ~]# tar xvf etcd-${ETCD_RELEASE}-linux-amd64.tar.gz etcd-v3.5.12-linux-amd64/ etcd-v3.5.12-linux-amd64/README.md etcd-v3.5.12-linux-amd64/READMEv2-etcdctl.md etcd-v3.5.12-linux-amd64/etcdutl etcd-v3.5.12-linux-amd64/etcdctl etcd-v3.5.12-linux-amd64/Documentation/ etcd-v3.5.12-linux-amd64/Documentation/README.md etcd-v3.5.12-linux-amd64/Documentation/dev-guide/ etcd-v3.5.12-linux-amd64/Documentation/dev-guide/apispec/ etcd-v3.5.12-linux-amd64/Documentation/dev-guide/apispec/swagger/ etcd-v3.5.12-linux-amd64/Documentation/dev-guide/apispec/swagger/v3election.swagger.json etcd-v3.5.12-linux-amd64/Documentation/dev-guide/apispec/swagger/rpc.swagger.json etcd-v3.5.12-linux-amd64/Documentation/dev-guide/apispec/swagger/v3lock.swagger.json etcd-v3.5.12-linux-amd64/README-etcdutl.md etcd-v3.5.12-linux-amd64/README-etcdctl.md etcd-v3.5.12-linux-amd64/etcd [root@k8s ~]# cd etcd-${ETCD_RELEASE}-linux-amd64 [root@k8s etcd-v3.5.12-linux-amd64]# mv etcd* /usr/local/bin [root@k8s etcd-v3.5.12-linux-amd64]# ps ax | grep etcd 15154 ? Ssl 75:24 etcd --advertise-client-urls=https://172.30.9.24:2379 --cert-file=/var/lib/minikube/certs/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/minikube/etcd --experimental-initial-co rrupt-check=true --experimental-watch-progress-notify-interval=5s --initial-advertise-peer-urls=https://172.30.9.24:2380 --initial-cluster=k8s.netico.pl=https://172.30.9.24:2380 --key-file=/var/lib/minikube/certs/etcd/ server.key --listen-client-urls=https://127.0.0.1:2379,https://172.30.9.24:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://172.30.9.24:2380 --name=k8s.netico.pl --peer-cert-file=/var/lib/min ikube/certs/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/var/lib/minikube/certs/etcd/peer.key --peer-trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt --proxy-refresh-interval=70000 --snapshot-count=10 000 --trusted-ca-file=/var/lib/minikube/certs/etcd/ca.crt 592463 ? Ssl 52:12 kube-apiserver --advertise-address=172.30.9.24 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/var/lib/minikube/certs/ca.crt --enable-admission-plugins=NamespaceLif ecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --enable-bootstrap-token-auth=true --etcd-cafile=/var/lib /minikube/certs/etcd/ca.crt --etcd-certfile=/var/lib/minikube/certs/apiserver-etcd-client.crt --etcd-keyfile=/var/lib/minikube/certs/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --kubelet-client-cert ificate=/var/lib/minikube/certs/apiserver-kubelet-client.crt --kubelet-client-key=/var/lib/minikube/certs/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cer t-file=/var/lib/minikube/certs/front-proxy-client.crt --proxy-client-key-file=/var/lib/minikube/certs/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/var/lib/mini kube/certs/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=8443 --service-account-issuer =https://kubernetes.default.svc.cluster.local --service-account-key-file=/var/lib/minikube/certs/sa.pub --service-account-signing-key-file=/var/lib/minikube/certs/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-ce rt-file=/var/lib/minikube/certs/apiserver.crt --tls-private-key-file=/var/lib/minikube/certs/apiserver.key 822039 pts/0 S+ 0:00 grep --color=auto etcd [root@k8s etcd-v3.5.12-linux-amd64]# [root@k8s etcd-v3.5.12-linux-amd64]# [root@k8s etcd-v3.5.12-linux-amd64]# [root@k8s etcd-v3.5.12-linux-amd64]# [root@k8s etcd-v3.5.12-linux-amd64]# ETCDCTL_API=3 etcdctl --endpoints=localhost:2379 --cacert /var/lib/minikube/certs/etcd/ca.crt --cert /var/lib/minikube/certs/etcd/server.crt --key /var/lib/minikube/certs/etcd/server.key get --prefix --keys-only Error: get command needs one argument as key and an optional argument as range_end [root@k8s etcd-v3.5.12-linux-amd64]# ETCDCTL_API=3 etcdctl --endpoints=localhost:2379 --cacert /var/lib/minikube/certs/etcd/ca.crt --cert /var/lib/minikube/certs/etcd/server.crt --key /var/lib/minikube/certs/etcd/server.key get / --prefix --keys-only /registry/apiregistration.k8s.io/apiservices/v1. /registry/apiregistration.k8s.io/apiservices/v1.admissionregistration.k8s.io /registry/apiregistration.k8s.io/apiservices/v1.apiextensions.k8s.io ... /registry/storageclasses/standard /registry/validatingwebhookconfigurations/ingress-nginx-admission [root@k8s etcd-v3.5.12-linux-amd64]# ETCDCTL_API=3 etcdctl --endpoints=localhost:2379 --cacert /var/lib/minikube/certs/etcd/ca.crt --cert /var/lib/minikube/certs/etcd/server.crt --key /var/lib/minikube/certs/etcd/server.key snapshot save /tmp/etcdbackup.db {"level":"info","ts":"2024-02-04T13:07:48.656201-0500","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/tmp/etcdbackup.db.part"} {"level":"info","ts":"2024-02-04T13:07:48.668951-0500","logger":"client","caller":"v3@v3.5.12/maintenance.go:212","msg":"opened snapshot stream; downloading"} {"level":"info","ts":"2024-02-04T13:07:48.669004-0500","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"localhost:2379"} {"level":"info","ts":"2024-02-04T13:07:48.73457-0500","logger":"client","caller":"v3@v3.5.12/maintenance.go:220","msg":"completed snapshot read; closing"} {"level":"info","ts":"2024-02-04T13:07:48.75807-0500","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"localhost:2379","size":"4.5 MB","took":"now"} {"level":"info","ts":"2024-02-04T13:07:48.758224-0500","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/tmp/etcdbackup.db"} Snapshot saved at /tmp/etcdbackup.db |
Verifying the Etcd Backup
sudo etcdutl --write-out=table snapshot status
/tmp/etcdbackup.db
- Just to be sure:
cp /tmp/etcdbackup.db /tmp/etcdbackup.db.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[root@k8s ~]# ETCDCTL_API=3 etcdctl --write-out=table snapshot status /tmp/etcdbackup.db Deprecated: Use `etcdutl snapshot status` instead. +----------+----------+------------+------------+ | HASH | REVISION | TOTAL KEYS | TOTAL SIZE | +----------+----------+------------+------------+ | 98a917f2 | 258850 | 778 | 4.5 MB | +----------+----------+------------+------------+ [root@k8s ~]# etcdutl --write-out=table snapshot status /tmp/etcdbackup.db +----------+----------+------------+------------+ | HASH | REVISION | TOTAL KEYS | TOTAL SIZE | +----------+----------+------------+------------+ | 98a917f2 | 258850 | 778 | 4.5 MB | +----------+----------+------------+------------+ [root@k8s ~]# cp /tmp/etcdbackup.db /tmp/etcdbackup.db.2 |
In case anything happens to one of this backup we always have a spare version.
Restoring the ETCD
sudo etcdutl snapshot restore /tmp/etcdbackup.db --
data-dir /var/lib/etcd-backup
restores the etcd backup in a non-default
folder- To start using it, the Kubernetes core services must be stopped, after which the etcd can be reconfigured to use the new directory
- To stop the core services, temporarily move
/etc/kubernetes/manifests/*.yaml to /etc/kubernetes/
- As the kubelet process temporarily polls for static Pod files, the etcd process will disappear within a minute
- Use
sudo crictl ps
to verify that is has been stopped - Once the etcd Pod has stopped, reconfigure the etcd to use the non-
default etcd path - In etcd.yaml you’ll find a HostPath volume with the name etcd-data, pointing to the location where the Etcd files are found. Change this to the location where the restored files are
- Move back the static Pod files to
/etc/kubernetes/manifests/
- Use
sudo crictl ps
to verify the Pods have restarted successfully - Next,
kubectl get all
should show the original Etcd resources
Restoring the Etcd Commands
kubectl delete --all deploy
cd /etc/kubernetes/manifests/
sudo mv * ..
# this will stop all running podssudo crictl ps
sudo etcdutl snapshot restore /tmp/etcdbackup.db --data-dir /var/lib/etcd-backup
sudo ls -l /var/lib/etcd-backup/
sudo vi /etc/kubernetes/etcd.yaml
# change etcd-data HostPath volume to /var/lib/etcd-backupsudo mv../*.yaml .
sudo crictl ps
# should show all resourceskubectl get deploy -A
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
[root@k8s ~]# kubectl get deploy NAME READY UP-TO-DATE AVAILABLE AGE apples 3/3 3 3 47h firstnginx 4/4 4 4 4d1h newdep 1/1 1 1 47h nginxsvc 1/1 1 1 2d webserver 1/1 1 1 2d7h webshop 3/3 3 3 2d3h [root@k8s ~]# kubectl delete deploy apples deployment.apps "apples" deleted [root@k8s ~]# kubectl delete deploy newdep deployment.apps "newdep" deleted [root@k8s ~]# cd /etc/kubernetes/manifests [root@k8s manifests]# ll razem 20 -rw-------. 1 root root 2497 02-01 15:18 etcd.yaml -rw-------. 1 root root 3800 02-01 15:18 kube-apiserver.yaml -rw-------. 1 root root 3124 02-01 15:18 kube-controller-manager.yaml -rw-------. 1 root root 1464 02-01 15:18 kube-scheduler.yaml -rw-r--r-- 1 root root 246 02-04 05:32 mystaticpod.yaml -rw-r--r-- 1 root root 0 02-03 16:17 staticpod.yaml [root@k8s manifests]# mv * .. [root@k8s manifests]# ll razem 0 [root@k8s manifests]# crictl ps CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD 7aec53d95a6a2 busybox@sha256:6d9ac9237a84afe1516540f40a0fafdc86859b2141954b4d643af7066d598b74 2 minutes ago Running busybox-container 426 776a9d 9f213bc two-containers e500dd1ee227f busybox@sha256:6d9ac9237a84afe1516540f40a0fafdc86859b2141954b4d643af7066d598b74 11 minutes ago Running sleepy 72 b598cb a0e6d7f sleepy b3ceda46f1ac7 eeb6ee3f44bd0 45 minutes ago Running centos2 67 4eb967 073bfbd morevol 352c3daf52c65 eeb6ee3f44bd0 45 minutes ago Running centos1 67 4eb967 073bfbd morevol 34fd825715348 b9a5a1927366a 5 hours ago Running metrics-server 0 09837f 97cd991 metrics-server-5f8988d664-7r8j7 3d63e59315a29 5374347291230 23 hours ago Running kube-apiserver 5 2ba0d9 b8722c4 kube-apiserver-k8s.netico.pl 3a65e1db97169 nginx@sha256:31754bca89a3afb25c04d6ecfa2d9671bc3972d8f4809ff855f7e35caa580de9 2 days ago Running nginx 0 0fd201 ae2d934 nginxsvc-5f8b7d4f4d-dtrs7 1a4722cbaaf94 registry.k8s.io/ingress-nginx/controller@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c 2 days ago Running controller 0 ea8cb6 f3530f0 ingress-nginx-controller-6858749594-27tm9 332cd7a3b2aa9 nginx@sha256:31754bca89a3afb25c04d6ecfa2d9671bc3972d8f4809ff855f7e35caa580de9 2 days ago Running nginx 0 8956ee 62249ab webshop-7f9fd49d4c-x2czc e136bd99527b9 nginx@sha256:31754bca89a3afb25c04d6ecfa2d9671bc3972d8f4809ff855f7e35caa580de9 2 days ago Running nginx 0 ae8bad 2f8c457 webshop-7f9fd49d4c-92nj2 2c1067f28073c nginx@sha256:31754bca89a3afb25c04d6ecfa2d9671bc3972d8f4809ff855f7e35caa580de9 2 days ago Running nginx 0 746beb f244884 webshop-7f9fd49d4c-kqllw c6c00eece623f nginx@sha256:31754bca89a3afb25c04d6ecfa2d9671bc3972d8f4809ff855f7e35caa580de9 2 days ago Running task-pv-container 0 d39bb4 41ef944 lab4-pod 036a4a1599a1a nginx@sha256:31754bca89a3afb25c04d6ecfa2d9671bc3972d8f4809ff855f7e35caa580de9 2 days ago Running nginx 0 6b4abf a363771 webserver-76d44586d-8gqhf f7426897bdb2e nginx@sha256:31754bca89a3afb25c04d6ecfa2d9671bc3972d8f4809ff855f7e35caa580de9 2 days ago Running pv-container 0 a09dd9 2ff2186 pv-pod 8dc188f5131a2 nginx@sha256:985224176778a8939b3869d3b9b9624ea9b3fe4eb1e9002c5f444d99ef034a9b 3 days ago Running nginx 0 ced6ee fe01d16 deploydaemon-zzllp 29bf2d747ac9a nginx@sha256:985224176778a8939b3869d3b9b9624ea9b3fe4eb1e9002c5f444d99ef034a9b 3 days ago Running nginx 0 2a0847 4cc3d2a init-demo 86bd1107d80e0 18ea23a675dae 3 days ago Running nginx 0 11c7ed 7ad36e9 web-2 dc28e2ca0b0f6 18ea23a675dae 3 days ago Running nginx 0 b1be4a b59e2ca web-1 9232cbdd25263 k8s.gcr.io/nginx-slim@sha256:8b4501fe0fe221df663c22e16539f399e89594552f400408303c42f3dd8d0e52 3 days ago Running nginx 0 15ef4c c356862 web-0 440f7bffbcf2e kubernetesui/metrics-scraper@sha256:76049887f07a0476dc93efc2d3569b9529bf982b22d29f356092ce206e98765c 3 days ago Running dashboard-metrics-scraper 0 b894aa 1e8f3df dashboard-metrics-scraper-7fd5cb4ddc-9ld5n 0cde8ae538723 nginx@sha256:985224176778a8939b3869d3b9b9624ea9b3fe4eb1e9002c5f444d99ef034a9b 3 days ago Running testpod 0 a69881 1bcb3d2 testpod 11bee0d7f2f94 nginx@sha256:985224176778a8939b3869d3b9b9624ea9b3fe4eb1e9002c5f444d99ef034a9b 3 days ago Running nginx 0 ac3431 dfc0d2d firstnginx-d8679d567-rhhlz 963cd068358d1 nginx@sha256:985224176778a8939b3869d3b9b9624ea9b3fe4eb1e9002c5f444d99ef034a9b 3 days ago Running nginx 0 61727f 7186357 mydaemon-d4dcd 56cb9d7954e7b kubernetesui/dashboard@sha256:2e500d29e9d5f4a086b908eb8dfe7ecac57d2ab09d65b24f588b1d449841ef93 3 days ago Running kubernetes-dashboard 0 f7e102 fa08dbe kubernetes-dashboard-8694d4445c-xjlsr 5f83a2ff3e40c nginx@sha256:985224176778a8939b3869d3b9b9624ea9b3fe4eb1e9002c5f444d99ef034a9b 3 days ago Running nginx 0 e33888 e473986 firstnginx-d8679d567-249g9 077ed650c7764 nginx@sha256:985224176778a8939b3869d3b9b9624ea9b3fe4eb1e9002c5f444d99ef034a9b 3 days ago Running nginx 0 bc0b06 833b812 firstnginx-d8679d567-66c4s 6e11c05ad56f0 nginx@sha256:985224176778a8939b3869d3b9b9624ea9b3fe4eb1e9002c5f444d99ef034a9b 3 days ago Running nginx-container 0 776a9d 9f213bc two-containers e4766099186a1 nginx@sha256:985224176778a8939b3869d3b9b9624ea9b3fe4eb1e9002c5f444d99ef034a9b 3 days ago Running nginx 0 10ffa7 290bd38 firstnginx-d8679d567-72qbd d132702c0281b bfc896cf80fba 3 days ago Running kube-proxy 1 e968a0 c3cc86b kube-proxy-5nmms [root@k8s manifests]# etcdutl snapshot restore /tmp/etcdbackup.db --data-dir /var/lib/etcd-backup 2024-02-04T16:14:51-05:00 info snapshot/v3_snapshot.go:260 restoring snapshot {"path": "/tmp/etcdbackup.db", "wal-dir": "/var/lib/etcd-backup/member/wal", "data-dir": "/var/lib/etcd-backup", "snap-dir": "/var/li b/etcd-backup/member/snap"} 2024-02-04T16:14:51-05:00 info membership/store.go:141 Trimming membership information from the backend... 2024-02-04T16:14:51-05:00 info membership/cluster.go:421 added member {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:23 80"]} 2024-02-04T16:14:51-05:00 info snapshot/v3_snapshot.go:287 restored snapshot {"path": "/tmp/etcdbackup.db", "wal-dir": "/var/lib/etcd-backup/member/wal", "data-dir": "/var/lib/etcd-backup", "snap-dir": "/var/li b/etcd-backup/member/snap"} [root@k8s manifests]# ls -l /var/lib/etcd-backup razem 0 drwx------ 4 root root 29 02-04 16:14 member [root@k8s manifests]# ls -l /var/lib/etcd-backup/member razem 0 drwx------ 2 root root 62 02-04 16:14 snap drwx------ 2 root root 51 02-04 16:14 wal [root@k8s manifests]# vim /etc/kubernetes/etcd.yaml [root@k8s manifests]# cat /etc/kubernetes/etcd.yaml apiVersion: v1 kind: Pod metadata: annotations: kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.30.9.24:2379 creationTimestamp: null labels: component: etcd tier: control-plane name: etcd namespace: kube-system spec: containers: - command: - etcd - --advertise-client-urls=https://172.30.9.24:2379 - --cert-file=/var/lib/minikube/certs/etcd/server.crt - --client-cert-auth=true - --data-dir=/var/lib/minikube/etcd - --experimental-initial-corrupt-check=true - --experimental-watch-progress-notify-interval=5s ... - hostPath: #path: /var/lib/minikube/etcd path: /var/lib/etcd-backup type: DirectoryOrCreate name: etcd-data status: {} [root@k8s manifests]# mv ../*.yaml . [root@k8s manifests]# ll razem 20 drwx------ 3 root root 20 02-04 16:13 default.etcd -rw------- 1 root root 2530 02-04 16:17 etcd.yaml -rw-------. 1 root root 3800 02-01 15:18 kube-apiserver.yaml -rw-------. 1 root root 3124 02-01 15:18 kube-controller-manager.yaml -rw-------. 1 root root 1464 02-01 15:18 kube-scheduler.yaml -rw-r--r-- 1 root root 246 02-04 05:32 mystaticpod.yaml -rw-r--r-- 1 root root 0 02-03 16:17 staticpod.yaml [root@k8s manifests]# [root@k8s manifests]# crictl ps CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD 9d6694c088673 ead0a4a53df89 3 seconds ago Running coredns 857 7d04b5 a159192 coredns-5dd5756b68-sgfkj a285ffd1ce0a0 73deb9a3f7025 21 seconds ago Running etcd 0 df26fd 67d3b54 etcd-k8s.netico.pl 9c56e218c8d67 busybox@sha256:6d9ac9237a84afe1516540f40a0fafdc86859b2141954b4d643af7066d598b74 2 minutes ago Running busybox-container 427 776a9d 9f213bc two-containers c00b990d0741d nginx@sha256:31754bca89a3afb25c04d6ecfa2d9671bc3972d8f4809ff855f7e35caa580de9 3 minutes ago Running mystaticpod 0 8e1741 461cf68 mystaticpod-k8s.netico.pl 1cd28d372b0b5 6d1b4fd1b182d 3 minutes ago Running kube-scheduler 0 dd2089 8212225 kube-scheduler-k8s.netico.pl ... 3 days ago Running kube-proxy 1 e968a0 c3cc86b kube-proxy-5nmms [root@k8s manifests]# kubectl get deploy NAME READY UP-TO-DATE AVAILABLE AGE apples 3/3 3 3 47h firstnginx 4/4 4 4 4d1h newdep 1/1 1 1 2d nginxsvc 1/1 1 1 2d1h webserver 1/1 1 1 2d8h webshop 3/3 3 3 2d3h |
Cluster Nodes Upgrade
- Kubernetes clusters can be upgraded from one to another minor versions
Skipping minor versions (1.23 to 1.25) is not supported - First, you’ll have to upgrade
kubeadm
- Next, you’ll need to upgrade the control plane node
- After that, the worker nodes are upgraded
- Use “Upgrading kubeadm clusters” from the documentation
Control Plane Node Upgrade Overview
- upgrade
kubeadm
- use
kubeadm upgrade plan
to check available versions - use
kubeadm upgrade apply v1.xx.y
to run the upgrade - use
kubectl drain controlnode --ignore-daemonsets
- upgrade and restart kubelet and kubectl
- use
kubectl uncordon controlnode
to bring back the control node - proceed with other nodes
High Availability Options
- Stacked control plane nodes requires less infrastructure as the etcd
members, and control plane nodes are co-located- Control planes and etcd members are running together on the same node
- For optimal protection, requires a minimum of 3 stacked control plane nodes
- External etcd cluster requires more infrastructure as the control plane nodes and etcd members are separated
- Etcd service is running on external nodes, so this requires twice the number of nodes
High Availability Requirements
- In a Kubernetes HA cluster, a load balancer is needed to distribute the
workload between the cluster nodes - The load balancer can be externally provided using open source software, or a load balancer appliance
Exploring Load Balancer Configuration
- In the load balancer setup, HAProxy is running on each server to provide
access to port 8443 on all IP addresses on that server - Incoming traffic on port 8443 is forwarded to the kube-apiserver port 6443
- The keepalived service is running on all HA nodes to provide a virtual IP address on one of the nodes
- kubectl clients connect to this VIP:8443,
- Use the setup-lb-ubuntu.sh script provided in the GitHub repository for easy setup
- Additional instructions are in the script
- After running the load balancer setup, use
nc 192.168.29.100 8443
to verify the availability of the load balancer IP and port
Setting up a Highly Available Kubernetes Cluster
- 3 VMs to be used as controllers in the cluster; Install K8s software but don’t
set up the cluster yet - 2 VMs to be used as worker nodes; Install K8s software
- Ensure
/etc/hosts
is set up for name resolution of all nodes and copy to all nodes - Disable selinux on all nodes if applicable
- Disable firewall if applicable
Initializing the HA Setup
sudo kubeadm init --control-plane-endpoint "192.168.29.100:8443" --
upload-certs
- Save the output of the command which shows next steps
- Configure networking
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
- Copy the
kubectl join
command that was printed after successfully initializing the first control node- Make sure to use the command that has
--control-plane
in it!
- Make sure to use the command that has
- Complete setup on other control nodes as instructed
- Use
kubectl get nodes
to verify setup - Continue and join worker nodes as instructed
Configuring the HA Client
- On the machine you want to use as operator workstation, create a
.kube
directory and copy/etc/kubernetes/admin.conf
from any control node to
the client machine - Install the
kubectl
utility - Ensure that host name resolution goes to the new control plane VIP
- Verify using
kubectl get nodes
Testing it
- On all nodes: find the VIP using
ip a
- On all nodes with a
kubectl
, usekubectl get all
to verify client working - Shut down the node that has the VIP
- Verify that
kubectl get all
still works - Troubleshooting: consider using
sudo systemctl restart haproxy
Lab: Etcd Backup and Restore
- Create a backup of the etcd
- Remove a few resources (Pods and/or Deployments)
- Restore the backup of the etcd and verify that gets your resources back