Resource Monitoring
kubectl get
can be used on any resource and shows generic resource
health- If metrics are collected, use
kubectl top pods
andkubectl top nodes
to get performance-related information about Pods and Nodes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
[root@k8s cka]# kubectl top pods NAME CPU(cores) MEMORY(bytes) busybox-6fc6c44c5b-xmmxd 0m 0Mi deploydaemon-zzllp 0m 7Mi firstnginx-d8679d567-249g9 0m 7Mi firstnginx-d8679d567-66c4s 0m 7Mi firstnginx-d8679d567-72qbd 0m 7Mi firstnginx-d8679d567-rhhlz 0m 7Mi lab4-pod 0m 7Mi morevol 0m 0Mi mydaemon-z7g9c 0m 7Mi mypod 0m 0Mi mysapod 0m 0Mi mystaticpod-k8s.netico.pl 0m 7Mi nginx-taint-68bd5db674-7skqs 0m 7Mi nginx-taint-68bd5db674-vjq89 0m 7Mi nginx-taint-68bd5db674-vqz2z 0m 7Mi nginxsvc-5f8b7d4f4d-dtrs7 0m 7Mi pv-pod 0m 7Mi security-context-demo 0m 0Mi sleepybox1 0m 0Mi sleepybox2 0m 0Mi webserver-76d44586d-8gqhf 0m 7Mi webshop-7f9fd49d4c-92nj2 0m 7Mi webshop-7f9fd49d4c-kqllw 0m 7Mi webshop-7f9fd49d4c-x2czc 0m 7Mi [root@k8s cka]# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% k8s.example.pl 276m 3% 4282Mi 27% [root@k8s cka]# kubectl get all -n kube-system NAME READY STATUS RESTARTS AGE pod/coredns-5dd5756b68-sgfkj 0/1 CrashLoopBackOff 3331 (4m12s ago) 13d pod/etcd-k8s.netico.pl 1/1 Running 0 13d pod/kube-apiserver-k8s.netico.pl 1/1 Running 0 9d pod/kube-controller-manager-k8s.netico.pl 1/1 Running 0 9d pod/kube-proxy-hgh55 1/1 Running 0 9d pod/kube-scheduler-k8s.netico.pl 1/1 Running 0 9d pod/metrics-server-5f8988d664-7r8j7 1/1 Running 0 9d pod/storage-provisioner 1/1 Running 8 (9d ago) 13d NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 13d service/metrics-server ClusterIP 10.102.216.61 <none> 443/TCP 9d NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 13d NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/coredns 0/1 1 0 13d deployment.apps/metrics-server 1/1 1 1 9d NAME DESIRED CURRENT READY AGE replicaset.apps/coredns-5dd5756b68 1 1 0 13d replicaset.apps/metrics-server-5f8988d664 1 1 1 9d replicaset.apps/metrics-server-6db4d75b97 0 0 0 9d |
Troubleshooting Flow
- Resources are first created in the Kubernetes etcd database
- Use
kubectl describe
andkubectl events
to see how that has been going - After adding the resources to the database, the Pod application is started on the nodes where it is scheduled
- Before it can be started, the Pod image needs to be fetched
- Use
sudo crictl images
to get a list
- Use
- Once the application is started, use
kubectl logs
to read the output of the application
Troubleshooting Pods
- The first step is to use
kubectl get
, which will give a generic overview of
Pod states - A Pod can be in any of the following states:
- Pending: the Pod has been created in etcd, but is waiting for an eligible node
- Running: the Pod is in healthy state
- Succeeded: the Pod has done its work and there is no need to restart it
- Failed: one or more containers in the Pod have ended with an error code and will not be restarted
- Unknown: the state could not be obtained, often related to network issues
- Completed: the Pod has run to completion
- CrashLoopBackOff: one or more containers in the Pod have generated an error, but the scheduler is still trying to run them
Investigating Resource Problems
- If
kubectl get
indicates that there is an issue, the next step is to usekubectl
describe
to get more information kubectl describe
shows API information about the resource and often has good indicators of what is going wrong- If
kubectl describe
shows that a Pod has an issue starting its primary container, usekubectl logs
to investigate the application logs - If the Pod is running, but not behaving as expected, open an interactive shell on the Pod for further troubleshooting:
kubectl exec -it mypod -- sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
[root@k8s ~]# kubectl get pods NAME READY STATUS RESTARTS AGE busybox-6fc6c44c5b-xmmxd 1/1 Running 165 (10m ago) 7d3h deploydaemon-zzllp 1/1 Running 0 12d firstnginx-d8679d567-249g9 1/1 Running 0 13d firstnginx-d8679d567-66c4s 1/1 Running 0 13d firstnginx-d8679d567-72qbd 1/1 Running 0 13d firstnginx-d8679d567-rhhlz 1/1 Running 0 13d lab4-pod 1/1 Running 0 12d morevol 2/2 Running 590 (45m ago) 12d mydaemon-z7g9c 1/1 Running 0 7d1h mypod 1/1 Running 40 (10m ago) 46h mysapod 1/1 Running 39 (45m ago) 45h mystaticpod-k8s.netico.pl 1/1 Running 0 10d nginx-taint-68bd5db674-7skqs 1/1 Running 0 8d nginx-taint-68bd5db674-vjq89 1/1 Running 0 8d nginx-taint-68bd5db674-vqz2z 1/1 Running 0 8d nginxsvc-5f8b7d4f4d-dtrs7 1/1 Running 0 11d pv-pod 1/1 Running 0 12d security-context-demo 1/1 Running 135 (47m ago) 5d21h sleepybox1 1/1 Running 161 (46m ago) 6d23h sleepybox2 1/1 Running 161 (46m ago) 6d23h testdb 0/1 CrashLoopBackOff 12 (3m23s ago) 40m webserver-76d44586d-8gqhf 1/1 Running 0 12d webshop-7f9fd49d4c-92nj2 1/1 Running 0 11d webshop-7f9fd49d4c-kqllw 1/1 Running 0 11d webshop-7f9fd49d4c-x2czc 1/1 Running 0 11d [root@k8s ~]# kubectl describe pod testdb Name: testdb Namespace: default Priority: 0 Service Account: default Node: k8s.netico.pl/172.30.9.24 Start Time: Wed, 14 Feb 2024 09:34:57 -0500 Labels: run=testdb Annotations: <none> Status: Running IP: 10.244.0.90 IPs: IP: 10.244.0.90 Containers: testdb: Container ID: docker://0b656c391a511154a4ef8638e175dc1a4bb675feef1ff717d2834464a9e4ebbc Image: mysql Image ID: docker-pullable://mysql@sha256:343b82684a6b05812c58ca20ccd3af8bcf8a5f48b1842f251c54379bfce848f9 Port: <none> Host Port: <none> State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Wed, 14 Feb 2024 10:12:02 -0500 Finished: Wed, 14 Feb 2024 10:12:03 -0500 Ready: False Restart Count: 12 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bxf5c (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-bxf5c: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 41m default-scheduler Successfully assigned default/testdb to k8s.netico.pl Normal Pulled 40m kubelet Successfully pulled image "mysql" in 19.169s (19.17s including waiting) Normal Pulled 40m kubelet Successfully pulled image "mysql" in 1.03s (1.03s including waiting) Normal Pulled 40m kubelet Successfully pulled image "mysql" in 1.027s (1.027s including waiting) Normal Created 39m (x4 over 40m) kubelet Created container testdb Normal Started 39m (x4 over 40m) kubelet Started container testdb Normal Pulled 39m kubelet Successfully pulled image "mysql" in 1.113s (1.113s including waiting) Normal Pulling 39m (x5 over 41m) kubelet Pulling image "mysql" Warning BackOff 62s (x183 over 40m) kubelet Back-off restarting failed container testdb in pod testdb_default(c6d47171-78db-4038-8c8e-0599cdebde2d) [root@k8s ~]# kubectl logs testdb 2024-02-14 15:17:10+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.3.0-1.el8 started. 2024-02-14 15:17:11+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2024-02-14 15:17:11+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.3.0-1.el8 started. 2024-02-14 15:17:11+00:00 [ERROR] [Entrypoint]: Database is uninitialized and password option is not specified You need to specify one of the following as an environment variable: - MYSQL_ROOT_PASSWORD - MYSQL_ALLOW_EMPTY_PASSWORD - MYSQL_RANDOM_ROOT_PASSWORD |
Testdb is individual unmanaged pod so we must delete it and run again with appropriate environment variable: kubectl set env -h
Troubleshooting Cluster Nodes
- Use
kubectl cluster-info
for a generic impression of cluster health - Use
kubectl cluster-info dump
for (too much) information coming from all the cluster log files kubectl get nodes
will give a generic overview of node healthkubectl get pods -n kube-system
shows Kubernetes core services running on the control nodekubectl describe node nodename
shows detailed information about nodes, check the “Conditions” section for operational informationsudo systemctl status kubelet
will show current status information about the kubeletsudo systemctl restart kubelet
allows you to restart itsudo openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -text
allows you to verify kubelet certificates and verify they are still valid- The kube-proxy Pods are running to ensure connectivity with worker nodes, use
kubectl get pods -n kube-system
for an overview
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
[root@k8s cka]# kubectl cluster-info Kubernetes control plane is running at https://172.30.9.24:8443 CoreDNS is running at https://172.30.9.24:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. [root@k8s cka]# kubectl cluster-info dump | more { "kind": "NodeList", "apiVersion": "v1", "metadata": { "resourceVersion": "1061012" }, "items": [ { "metadata": { "name": "k8s.example.pl", "uid": "e15d407a-6d1a-443f-93dd-e81ca406c89f", "resourceVersion": "1060966", "creationTimestamp": "2024-01-31T15:03:23Z", "labels": { "beta.kubernetes.io/arch": "amd64", "beta.kubernetes.io/os": "linux", "kubernetes.io/arch": "amd64", "kubernetes.io/hostname": "k8s.example.pl", "kubernetes.io/os": "linux", "minikube.k8s.io/commit": "8220a6eb95f0a4d75f7f2d7b14cef975f050512d", "minikube.k8s.io/name": "minikube", "minikube.k8s.io/primary": "true", "minikube.k8s.io/updated_at": "2024_01_31T10_03_27_0700", "minikube.k8s.io/version": "v1.32.0", "node-role.kubernetes.io/control-plane": "", "node.kubernetes.io/exclude-from-external-load-balancers": "" }, "annotations": { "kubeadm.alpha.kubernetes.io/cri-socket": "unix:///var/run/cri-dockerd.sock", "node.alpha.kubernetes.io/ttl": "0", "volumes.kubernetes.io/controller-managed-attach-detach": "true" } }, ... [root@k8s cka]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-5dd5756b68-sgfkj 0/1 CrashLoopBackOff 3537 (4m52s ago) 14d etcd-k8s.example.pl 1/1 Running 0 14d kube-apiserver-k8s.example.pl 1/1 Running 0 9d kube-controller-manager-k8s.example.pl 1/1 Running 0 9d kube-proxy-hgh55 1/1 Running 0 9d kube-scheduler-k8s.example.pl 1/1 Running 0 9d metrics-server-5f8988d664-7r8j7 1/1 Running 0 9d storage-provisioner 1/1 Running 8 (9d ago) 14d [root@k8s cka]# kubectl describe node | more Name: k8s.example.pl Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=k8s.example.pl kubernetes.io/os=linux minikube.k8s.io/commit=8220a6eb95f0a4d75f7f2d7b14cef975f050512d minikube.k8s.io/name=minikube minikube.k8s.io/primary=true minikube.k8s.io/updated_at=2024_01_31T10_03_27_0700 minikube.k8s.io/version=v1.32.0 node-role.kubernetes.io/control-plane= node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Wed, 31 Jan 2024 10:03:23 -0500 Taints: <none> Unschedulable: false Lease: HolderIdentity: k8s.example.pl AcquireTime: <unset> RenewTime: Wed, 14 Feb 2024 11:03:31 -0500 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Messag e ---- ------ ----------------- ------------------ ------ ------ - MemoryPressure False Wed, 14 Feb 2024 11:02:26 -0500 Sat, 03 Feb 2024 13:59:11 -0500 KubeletHasSufficientMemory kubele t has sufficient memory available DiskPressure False Wed, 14 Feb 2024 11:02:26 -0500 Sat, 03 Feb 2024 13:59:11 -0500 KubeletHasNoDiskPressure kubele t has no disk pressure PIDPressure False Wed, 14 Feb 2024 11:02:26 -0500 Sat, 03 Feb 2024 13:59:11 -0500 KubeletHasSufficientPID kubele t has sufficient PID available Ready True Wed, 14 Feb 2024 11:02:26 -0500 Sat, 03 Feb 2024 13:59:11 -0500 KubeletReady kubele t is posting ready status Addresses: InternalIP: 172.30.9.24 Hostname: k8s.example.pl Capacity: cpu: 8 ephemeral-storage: 64177544Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16099960Ki pods: 110 Allocatable: cpu: 8 ephemeral-storage: 64177544Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16099960Ki pods: 110 System Info: Machine ID: 0cc7c63085694b83adcd204eff748ff8 [root@k8s cka]# openssl x589 -in /var/lib/kubelet/pki/kubelet.crt -text Invalid command 'x589'; type "help" for a list. [root@k8s cka]# openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -text Certificate: Data: Version: 3 (0x2) Serial Number: 6515317418553622450 (0x5a6b0eac2bcdbfb2) Signature Algorithm: sha256WithRSAEncryption Issuer: CN = k8s.example.pl-ca@1706713394 Validity Not Before: Jan 31 14:03:14 2024 GMT Not After : Jan 30 14:03:14 2025 GMT |
Application Access
- To access applications running in the Pods, Services and Ingress are used
- The Service resource uses a selector label to connect to Pods with a matching label
- The Ingress resource connects to a Service and picks up its selector label to connect to the backend Pods directly
- To troubleshoot application access, check the labels in all of these resources
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
[root@k8s cka]# kubectl get endpoints NAME ENDPOINTS AGE apples <none> 11d interginx <none> 7d17h kubernetes 172.30.9.24:8443 14d newdep <none> 11d nginxsvc <none> 11d nwp-nginx <none> 7d5h webserver <none> 7d17h webshop 10.244.0.23:80,10.244.0.24:80,10.244.0.25:80 11d [root@k8s cka]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE apples ClusterIP 10.101.6.55 <none> 80/TCP 11d interginx ClusterIP 10.102.130.239 <none> 80/TCP 7d17h kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 14d newdep ClusterIP 10.100.68.120 <none> 8080/TCP 11d nginxsvc ClusterIP 10.104.155.180 <none> 80/TCP 11d nwp-nginx ClusterIP 10.109.118.169 <none> 80/TCP 7d5h webserver ClusterIP 10.109.5.62 <none> 80/TCP 7d17h webshop NodePort 10.109.119.90 <none> 80:32064/TCP 11d [root@k8s cka]# curl 10.109.5.62 curl: (7) Failed to connect to 10.109.5.62 port 80: Połączenie odrzucone [root@k8s cka]# kubectl edit svc webserver |
The selector is wrong:
1 2 |
selector: run: webServer |
Change the selector to webserver
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
apiVersion: v1 kind: Service metadata: creationTimestamp: "2024-02-06T22:33:54Z" labels: run: webserver name: webserver namespace: default resourceVersion: "439277" uid: 58660b0f-ddc9-4a2b-9bc7-ec52973bc38f spec: clusterIP: 10.109.5.62 clusterIPs: - 10.109.5.62 internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - port: 80 protocol: TCP targetPort: 80 selector: run: webserver sessionAffinity: None type: ClusterIP status: loadBalancer: {} |
Probes
- Probes can be used to test access to Pods
- They are a part of the Pod specification
- A readinessProbe is used to make sure a Pod is not published as available until the readinessProbe has been able to access it
- The livenessProbe is used to continue checking the availability of a Pod
- The startupProbe was introduced for legacy applications that require additional startup time on first initialization
Probe Types
- The probe itself is a simple test, which is often a command
- The following probe types are defined in pods.spec.container
exec
: a command is executed and returns a zero exit valuehttpGet
: an HTTP request returns a response code between 200 and 399tcpSocket
: connectivity to a TCP socket (available port) is successful
- The Kubernetes API provides 3 endpoints that indicate the current status of the API server
/healtz
/livez
/readyz
- These endpoints can be used by different probes
Using Probes
kubectl create -f busybox-ready.yaml
kubectl get pods note
the READY state, which is set to 0/1, which means that the Pod has successfully started, but is not considered ready.kubectl edit pods busybox-ready
and change /tmp/nothing to /etc/hosts. Notice this is not allowed.kubectl exec —it busybox-ready -- /bin/sh
touch /tmp/nothing; exit
kubectl get pods
at this point we have a Pod that is started
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
[root@controller ckad]# cat busybox-ready.yaml apiVersion: v1 kind: Pod metadata: name: busybox-ready namespace: default spec: containers: - name: busy image: busybox command: - sleep - "3600" readinessProbe: periodSeconds: 10 exec: command: - cat - /tmp/nothing resources: {} [root@controller ckad]# kubectl create -f busybox-ready.yaml pod/busybox-ready created [root@controller ckad]# kubectl get pods NAME READY STATUS RESTARTS AGE busybox-ready 0/1 Running 0 8s newnginx-798b4cfdf6-fz54b 1/1 Running 0 167m newnginx-798b4cfdf6-q9rqd 1/1 Running 0 167m newnginx-798b4cfdf6-tbwvm 1/1 Running 0 170m [root@controller ckad]# kubectl describe pods busybox-ready Name: busybox-ready Namespace: default Priority: 0 Service Account: default Node: worker2.example.com/172.30.9.27 Start Time: Tue, 05 Mar 2024 12:52:36 -0500 Labels: <none> Annotations: cni.projectcalico.org/containerID: 4ad005fe0bb34b3e7c7acbc12f8ce2aef0a7341e3baf4d220b6cdd925e612c3c cni.projectcalico.org/podIP: 172.16.71.203/32 cni.projectcalico.org/podIPs: 172.16.71.203/32 Status: Running IP: 172.16.71.203 IPs: IP: 172.16.71.203 Containers: busy: Container ID: containerd://c8e5fa9d2ddb913249699e6dd042284cb6ee90603be31e1c43964699bfca8973 Image: busybox Image ID: docker.io/library/busybox@sha256:6d9ac9237a84afe1516540f40a0fafdc86859b2141954b4d643af7066d598b74 Port: <none> Host Port: <none> Command: sleep 3600 State: Running Started: Tue, 05 Mar 2024 12:52:38 -0500 Ready: False Restart Count: 0 Readiness: exec [cat /tmp/nothing] delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-p6pdg (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-p6pdg: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 32s default-scheduler Successfully assigned default/busybox-ready to worker2.example.com Normal Pulling 31s kubelet Pulling image "busybox" Normal Pulled 30s kubelet Successfully pulled image "busybox" in 1.241s (1.241s including waiting) Normal Created 30s kubelet Created container busy Normal Started 30s kubelet Started container busy Warning Unhealthy 2s (x5 over 29s) kubelet Readiness probe failed: cat: can't open '/tmp/nothing': No such file or directory [root@controller ckad]# kubectl edit pods busybox-ready |
Change /tmp/nothing
1 2 3 4 5 |
readinessProbe: exec: command: - cat - /tmp/nothing |
to /etc/hosts
1 2 3 4 5 |
readinessProbe: exec: command: - cat - /etc/hosts |
This is unfortunatelly forbiden so we will do it in the diffrent way:
1 2 3 4 5 6 7 8 |
[root@controller ckad]# kubectl exec -it busybox-ready -- touch /tmp/nothing [root@controller ckad]# kubectl get pods NAME READY STATUS RESTARTS AGE busybox-ready 1/1 Running 0 11m newnginx-798b4cfdf6-fz54b 1/1 Running 0 179m newnginx-798b4cfdf6-q9rqd 1/1 Running 0 179m newnginx-798b4cfdf6-tbwvm 1/1 Running 0 3h1m |
Now busybox-pod 1 of 1 is ready.
Using Probes example 2
kubectl create -f nginx-probes.yaml
kubectl get pods
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
[root@controller ckad]# cat nginx-probes.yaml apiVersion: v1 kind: Pod metadata: name: nginx-probes labels: role: web spec: containers: - name: nginx-probes image: nginx readinessProbe: tcpSocket: port: 80 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: tcpSocket: port: 80 initialDelaySeconds: 20 periodSeconds: 20 [root@controller ckad]# kubectl create -f nginx-probes.yaml pod/nginx-probes created [root@controller ckad]# kubectl get pods NAME READY STATUS RESTARTS AGE busybox-ready 1/1 Running 0 19m newnginx-798b4cfdf6-fz54b 1/1 Running 0 3h7m newnginx-798b4cfdf6-q9rqd 1/1 Running 0 3h7m newnginx-798b4cfdf6-tbwvm 1/1 Running 0 3h10m nginx-probes 0/1 Running 0 7s [root@controller ckad]# kubectl get pods NAME READY STATUS RESTARTS AGE busybox-ready 1/1 Running 0 20m newnginx-798b4cfdf6-fz54b 1/1 Running 0 3h7m newnginx-798b4cfdf6-q9rqd 1/1 Running 0 3h7m newnginx-798b4cfdf6-tbwvm 1/1 Running 0 3h10m nginx-probes 1/1 Running 0 24s |
Lab: Troubleshooting Kubernetes
- Create a Pod that is running the Busybox container with the sleep 3600
command. Configure a Probe that checks for the existence of the file
/etc/hosts
on that Pod.
Go to the Documentation: search “probes” -> Configure Liveness, Readiness and Startup Probes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
[root@controller ckad]# vi troublelab.yaml [root@controller ckad]# cat troublelab.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: containers: - name: liveness image: registry.k8s.io/busybox args: - /bin/sh - -c - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600 livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5 [root@controller ckad]# vim troublelab.yaml [root@controller ckad]# cat troublelab.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: containers: - name: liveness image: registry.k8s.io/busybox args: - /bin/sh - -c - sleep 3600 livenessProbe: exec: command: - cat - /etc/hosts initialDelaySeconds: 5 periodSeconds: 5 [root@controller ckad]# kubectl create -f troublelab.yaml pod/liveness-exec created [root@controller ckad]# kubectl get all NAME READY STATUS RESTARTS AGE pod/busybox-ready 0/1 Running 3 (7m29s ago) 3h7m pod/liveness-exec 1/1 Running 0 90s pod/nginx-probes 1/1 Running 0 167m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/canary ClusterIP 10.96.148.14 <none> 80/TCP 6h31m service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d4h |
Lab: Troubleshooting Nodes
- Use the appropriate tools to find out if the cluster nodes are in good health you are going to use all
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
[root@k8s cka]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s.example.pl Ready control-plane 14d v1.28.3 [root@k8s cka]# kubectl describe node k8s.example.pl | more Name: k8s.example.pl Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=k8s.example.pl kubernetes.io/os=linux minikube.k8s.io/commit=8220a6eb95f0a4d75f7f2d7b14cef975f050512d minikube.k8s.io/name=minikube minikube.k8s.io/primary=true minikube.k8s.io/updated_at=2024_01_31T10_03_27_0700 minikube.k8s.io/version=v1.32.0 node-role.kubernetes.io/control-plane= node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Wed, 31 Jan 2024 10:03:23 -0500 Taints: <none> Unschedulable: false Lease: HolderIdentity: k8s.example.pl AcquireTime: <unset> RenewTime: Wed, 14 Feb 2024 11:27:10 -0500 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Messag e ---- ------ ----------------- ------------------ ------ ------ - MemoryPressure False Wed, 14 Feb 2024 11:22:50 -0500 Sat, 03 Feb 2024 13:59:11 -0500 KubeletHasSufficientMemory kubele t has sufficient memory available DiskPressure False Wed, 14 Feb 2024 11:22:50 -0500 Sat, 03 Feb 2024 13:59:11 -0500 KubeletHasNoDiskPressure kubele t has no disk pressure PIDPressure False Wed, 14 Feb 2024 11:22:50 -0500 Sat, 03 Feb 2024 13:59:11 -0500 KubeletHasSufficientPID kubele t has sufficient PID available Ready True Wed, 14 Feb 2024 11:22:50 -0500 Sat, 03 Feb 2024 13:59:11 -0500 KubeletReady kubele t is posting ready status Addresses: InternalIP: 172.30.9.24 Hostname: k8s.example.pl Capacity: cpu: 8 ephemeral-storage: 64177544Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16099960Ki pods: 110 Allocatable: cpu: 8 ephemeral-storage: 64177544Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16099960Ki pods: 110 System Info: Machine ID: 0cc7c63085694b83adcd204eff748ff8 [root@k8s cka]# systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since Sat 2024-02-03 13:59:11 EST; 1 weeks 3 days ago Docs: https://kubernetes.io/docs/ Main PID: 555436 (kubelet) Tasks: 17 (limit: 100376) Memory: 103.1M CGroup: /system.slice/kubelet.service └─555436 /var/lib/minikube/binaries/v1.28.3/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --co> lut 14 11:27:50 k8s.example.pl kubelet[555436]: E0214 11:27:50.488077 555436 file.go:182] "Provided manifest path is a directory,> [root@k8s cka]# systemctl status containerd ● containerd.service - containerd container runtime Loaded: loaded (/usr/lib/systemd/system/containerd.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2024-02-01 15:23:34 EST; 1 weeks 5 days ago Docs: https://containerd.io Main PID: 15732 (containerd) Tasks: 1102 Memory: 1.0G CGroup: /system.slice/containerd.service ├─ 15732 /usr/bin/containerd ├─ 17376 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 10ffa7290bd3850824976e9be9652e17b05bffbd329cf05186e0019> |