You can usually ignore the differences between Kubernetes deployments and OpenShift deployment configurations when troubleshooting applications. The common failure scenarios and the ways to troubleshoot them are essentially the same.
Troubleshooting Pods That Fail to Start
A common scenario is that OpenShift creates a pod and that pod never establishes a Running state. At some point, the pods are in an error state, such as ErrImagePull
or ImagePullBackOff
. Troubleshooting:
oc get pod
oc status
oc get events
oc describe pod
oc describe
Troubleshooting Running and Terminated Pods
OpenShift creates a pod, and for a short time no problem is encountered. The pod enters the Running state, which means at least one of its containers started running. Later, an application running inside one of the pod containers stops working. OpenShift tries to restart the container several times. If the application continues terminating, due to health probes or other reasons, then the pod will be left in the CrashLoopBackOff
state.
-
oc logs <my-pod-name>
If the pod contains multiple containers, then the oc logs command requires the -c option. oc logs <my-pod-name> -c <my-container-name>
Using oc debug
- When troubleshooting, it’s useful to get an exact copy of a running Pod and troubleshoot from there
- Since a Pod that is failing may not be started, and for that reason is not accessible to
rsh
andexec
, thedebug
command provides an alternative - The
debug
Pod will start a shell inside of the first container of the referenced Pod - The started Pod is a copy of the source Pod, with labels stripped, no probes, and the command changed to
/bin/sh
- Useful command arguments can be
--as-root
or--as-user=10000
to run as root or as a specific user - Use
exit
to close and destroy the debug Pod
Demo: Using oc debug
oc login -u developer -p developer
oc create deployment dnginx --image=nginx
oc get pods
# shows failureoc debug deploymentconfig/dnginx --as-user=10000
# will fail, select user ID as suggestednginx
# will failexit
oc debug deploymentconfig/dnginx --as-root
# will fail, login as admin and try againnginx
# will runexit
- This test has shown that the nginx image needs to run as root
Let’s create a new project and new deployment:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
$ oc login -u developer -p developer $ oc new-project debug $ oc create deployment dnginx --image=nginx deployment.apps/dnginx created $ oc get all NAME READY STATUS RESTARTS AGE pod/dnginx-88c7766dd-hlbtd 0/1 Error 2 30s NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deployment.apps/dnginx 1 1 1 0 30s NAME DESIRED CURRENT READY AGE replicaset.apps/dnginx-88c7766dd 1 1 0 30s $ oc get pods NAME READY STATUS RESTARTS AGE dnginx-88c7766dd-hlbtd 0/1 CrashLoopBackOff 6 8m |
Let’s debug the pod:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
$ oc debug deploy/dnginx --as-user=10000 Defaulting container name to nginx. Use 'oc describe pod/dnginx-debug -n debug' to see all of the containers in this pod. Debugging with pod/dnginx-debug, original command: <image entrypoint> Error from server (Forbidden): pods "dnginx-debug" is forbidden: unable to validate against any security context constraint: [spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 10000: must be in the ranges: [1000200000, 1000209999]] $ oc debug deploy/dnginx --as-user=1000200000 Defaulting container name to nginx. Use 'oc describe pod/dnginx-debug -n debug' to see all of the containers in this pod. Debugging with pod/dnginx-debug, original command: <image entrypoint> Waiting for pod to start ... If you don't see a command prompt, try pressing enter. $ nginx 2023/07/27 09:13:33 [warn] 7#7: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:2 nginx: [warn] the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:2 2023/07/27 09:13:33 [emerg] 7#7: mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied) nginx: [emerg] mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied) $ exit Removing debug pod ... |
As we see in the log w are doing “Permission denied”, that is why there is an error in the pod.
Now let’s debug the pod as admin:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
$ oc debug deploy/dnginx --as-root Defaulting container name to nginx. Use 'oc describe pod/dnginx-debug -n debug' to see all of the containers in this pod. Debugging with pod/dnginx-debug, original command: <image entrypoint> Error from server (Forbidden): pods "dnginx-debug" is forbidden: unable to validate against any security context constraint: [spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000200000, 1000209999]] $ oc login -u kubeadmin -p redhat $ oc project debug $ oc get all NAME READY STATUS RESTARTS AGE pod/dnginx-88c7766dd-hlbtd 0/1 CrashLoopBackOff 11 33m NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deployment.apps/dnginx 1 1 1 0 33m NAME DESIRED CURRENT READY AGE replicaset.apps/dnginx-88c7766dd 1 1 0 33m $ oc debug deploy/dnginx --as-root Defaulting container name to nginx. Use 'oc describe pod/dnginx-debug -n debug' to see all of the containers in this pod. Debugging with pod/dnginx-debug, original command: <image entrypoint> Waiting for pod to start ... If you don't see a command prompt, try pressing enter. # nginx 2023/07/27 09:35:32 [notice] 7#7: using the "epoll" event method 2023/07/27 09:35:32 [notice] 7#7: nginx/1.25.1 2023/07/27 09:35:32 [notice] 7#7: built by gcc 12.2.0 (Debian 12.2.0-14) 2023/07/27 09:35:32 [notice] 7#7: OS: Linux 3.10.0-1160.92.1.el7.x86_64 2023/07/27 09:35:32 [notice] 7#7: getrlimit(RLIMIT_NOFILE): 1048576:1048576 2023/07/27 09:35:32 [notice] 8#8: start worker processes 2023/07/27 09:35:32 [notice] 8#8: start worker process 9 # 2023/07/27 09:35:32 [notice] 8#8: start worker process 10 2023/07/27 09:35:32 [notice] 8#8: start worker process 11 2023/07/27 09:35:32 [notice] 8#8: start worker process 12 2023/07/27 09:35:32 [notice] 8#8: start worker process 13 2023/07/27 09:35:32 [notice] 8#8: start worker process 14 2023/07/27 09:35:32 [notice] 8#8: start worker process 15 2023/07/27 09:35:32 [notice] 8#8: start worker process 16 # ps aux /bin/sh: 4: ps: not found # ls /proc 1 14 9 cmdline diskstats filesystems irq kmsg mdstat mtrr schedstat stat timer_list vmallocinfo 10 15 acpi consoles dma fs kallsyms kpagecount meminfo net scsi swaps timer_stats vmstat 11 16 buddyinfo cpuinfo driver interrupts kcore kpageflags misc pagetypeinfo self sys tty xen 12 17 bus crypto execdomains iomem key-users loadavg modules partitions slabinfo sysrq-trigger uptime zoneinfo 13 8 cgroups devices fb ioports keys locks mounts sched_debug softirqs sysvipc version # cat /proc/8/cmdline nginx: master process nginx# # cat /proc/9/cmdline nginx: worker process# # exit Removing debug pod ... |
As we see on the admin account the nginx pod works prooperly.
Lab: Fixing Application Permissions
- Use
oc run mynginx --image=nginx
to run an Nginx webserver Pod - It fails. Fix it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
$ oc new-project fixapp $ oc run mynginx --image=nginx $ oc get pods NAME READY STATUS RESTARTS AGE mynginx 0/1 Pending 0 11s $ oc logs mynginx $ oc describe mynginx $ oc get pod mynginx -o yaml | grep message message: 0/1 nodes are available: 1 node(s) had untolerated taint {Node: Worker}. $ oc describe pod/mynginx Name: mynginx Namespace: fixapp Priority: 0 Node: <none> Labels: run=mynginx Annotations: openshift.io/scc: anyuid Status: Pending IP: IPs: <none> Containers: mynginx: Image: nginx Port: <none> Host Port: <none> Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wk9p9 (ro) Conditions: Type Status PodScheduled False Volumes: kube-api-access-wk9p9: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 96s (x7 over 33m) default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {Node: Worker}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling. $ oc get all NAME READY STATUS RESTARTS AGE pod/mynginx 0/1 Pending 0 2m38s $ oc get pod mynginx -o yaml | oc adm policy scc-subject-review -f - RESOURCE ALLOWED BY Pod/mynginx anyuid $ oc get pods NAME READY STATUS RESTARTS AGE mynginx 0/1 Pending 0 11m $ oc get all NAME READY STATUS RESTARTS AGE pod/mynginx 0/1 Pending 0 11m $ oc get pods -o yaml ... message: '0/1 nodes are available: 1 node(s) had untolerated taint {Node: Worker}. ... $ oc get pods -o yaml > fixapp.yaml $ oc delete pod mynginx pod "mynginx" deleted $ oc get pods No resources found in fixapp namespace. $ vi fixapp.yaml --- serviceAccount: fixapp-sa serviceAccountName: fixapp-sa --- tolerations: - effect: NoSchedule key: Node value: Worker operator: Equal --- $ oc create sa fixapp-sa $ oc adm policy add-scc-to-user anyuid -z fixapp-sa $ oc create -f fixapp.yaml $ oc get pods NAME READY STATUS RESTARTS AGE mynginx 1/1 Running 0 23s |
Change the yaml file:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
securityContext: fsGroup: 1000510000 seLinuxOptions: level: s0:c23,c2 serviceAccount: fixapp-sa serviceAccountName: fixapp-sa terminationGracePeriodSeconds: 10 volumes: - name: deployer-token-c84pp secret: defaultMode: 420 secretName: deployer-token-c84pp status: |
And then:
1 2 3 4 5 6 |
$ oc create -f fixapp.yaml pod/mynginx-1-deploy created $ oc get pods NAME READY STATUS RESTARTS AGE mynginx-1-deploy 0/1 ContainerCreating 0 6s |
Lab: Configuring MySQL
• As the developer user, use a deployment to create an application named mysql in the microservice project
• Create a generic secret named mysql, using password as the key and mypassword as its value.
Use this secret to set the MYSQL_ROOT_PASSWORD environment variable to the value of the password in the secret.
• Configure the MySQL application to mount a PVC to /mnt. The PVC must have a 1GiB size, and the ReadWriteOnce access mode
• Use a Nodeselector to ensure that MySQL will only run on your CRC node
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
$ oc login -u kubeadmin -p $ (cat ~/.crc/machines/crc/kubeadmin-password) https://api.crc.testing:6443 $ oc new-project microservice $ oc new-app --name mysql --docker-image mysql $ oc get pods NAME READY STATUS RESTARTS AGE mysql-59cd867785-qd6gb 0/1 ContainerCreating 0 4s $ oc logs mysql-59cd867785-qd6gb 2023-09-18 14:32:59+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.1.0-1.el8 started. 2023-09-18 14:33:01+00:00 [ERROR] [Entrypoint]: Database is uninitialized and password option is not specified You need to specify one of the following as an environment variable: - MYSQL_ROOT_PASSWORD - MYSQL_ALLOW_EMPTY_PASSWORD - MYSQL_RANDOM_ROOT_PASSWORD $ oc create secret generic mysql --from-literal=password=mypassword $ oc set env deployment mysql --prefix MYSQL_ROOT_ --from secret/mysql $ oc get pods Running $ oc set volumes deployment/mysql --name mysql-pvc --add --type pvc --claim-size 1Gi --claim-mode rwo --mount-path /mnt $ oc get nodes NAME STATUS ROLES AGE VERSION crc-lgph7-master-0 Ready master,worker 321d v1.24.0+3882f8f $ oc label nodes crc-lgph7-master-0 role=master node/crc-lgph7-master-0 labeled $ oc edit deployment dnsPolicy: ClusterFirst nodeSelector: role:master restartPolicy: Always $ oc get pods NAME READY STATUS RESTARTS AGE mysql-767bb84f9-8944q 1/1 Running 0 11m |
Lab: Configuring WordPress
• As the developer user, use a deployment to create an application named wordpress in the microservice project
• Run this application with the anyuid security context assigned to the wordpress-sa service account
• Create a route to the WordPress application, using the hostname word press-microservice.apps-crc.testi ng
• Use secrets and or ConfigMaps to set environment variables:
• WORDPRESS_DB_HOST: is set to mysql
• WORDPRESS_DB_NAME: is set to the value of wordpress
• WORDPRESS_DB_USER: has the value “root”
• WORDPRESS_DB_PASSWORD is set to the value of the password key in the mysql secret
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
$ oc whoami developer $ oc new-project wordpress $ oc new-app --name wordpress --docker-image wordpress Flag --docker-image has been deprecated, Deprecated flag use --image $ oc get pods NAME READY STATUS RESTARTS AGE wordpress-5db7955867-65p8n 0/1 CrashLoopBackOff 1 (15s ago) 2m18s $ oc login -u kubeadmin -p $(cat ~/.crc/machines/crc/kubeadmin-password) https://api.crc.testing:6443 $ oc create sa wordpress-sa $ oc set sa deployment wordpress wordpress-sa $ oc adm policy add-scc-to-user anyuid -z wordpress-sa $ oc expose svc wordpress $ oc create cm wordpress-cm --from-literal=host=mysql --from-literal=name=wordpress --from-literal=user=root --from-literal=password=password $ oc set env deployment wordpress --prefix WORDPRESS_DB_ --from configmap/wordpress-cm $ oc get pods NAME READY STATUS RESTARTS AGE wordpress-8c97b779d-svrsh 1/1 Running 0 11s |