Scheduling
- Kube-scheduler takes care of finding a node to schedule new Pods
- Nodes are filtered according to specific requirements that may be set
- Resource requirements
- Affinity and anti-affinity
- Taints and tolerations and more
- The scheduler first finds feasible nodes then scores them; it then picks the node with the highest score
- Once this node is found, the scheduler notifies the API server in a process called binding
From Scheduler to Kubelet
- Once the scheduler decision has been made, it is picked up by the kubelet
- The kubelet will instruct the CRI to fetch the image of the required container
- After fetching the image, the container is created and started
Setting Node Preferences
- The nodeSelector field in the pod.spec specifies a key-value pair that must
match a label which is set on nodes that are eligible to run the Pod - Use
kubectl label nodes worker1.example.com disktype=ssd
to set the label on a node - Use
nodeSelector:disktype:ssd
in the pod.spec to match the Pod to the specific node nodeName
is part of the pod.spec and can be used to always run a Pod on a node with a specific name- Not recommended: if that node is not currently available; the Pod will never run
Using Node Preferences
kubectl label nodes worker2 disktype=ssd
kubectl apply -f selector-pod.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
[root@k8s cka]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s.example.pl Ready control-plane 4d21h v1.28.3 [root@k8s cka]# kubectl label nodes k8s.example.pl disktype=ssd node/k8s.example.pl labeled [root@k8s cka]# cat selector-pod.yaml apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent nodeSelector: disktype: ssd [root@k8s cka]# kubectl cordon k8s.example.pl node/k8s.example.pl cordoned [root@k8s cka]# kubectl apply -f selector-pod.yaml pod/nginx created [root@k8s cka]# kubectl get pods NAME READY STATUS RESTARTS AGE deploydaemon-zzllp 1/1 Running 0 3d15h firstnginx-d8679d567-249g9 1/1 Running 0 4d16h firstnginx-d8679d567-66c4s 1/1 Running 0 4d16h firstnginx-d8679d567-72qbd 1/1 Running 0 4d16h firstnginx-d8679d567-rhhlz 1/1 Running 0 3d23h init-demo 1/1 Running 0 4d1h lab4-pod 1/1 Running 0 2d22h morevol 2/2 Running 166 (8m12s ago) 3d11h mydaemon-d4dcd 1/1 Running 0 3d15h mystaticpod-k8s.example.pl 1/1 Running 0 26h nginx 0/1 Pending 0 8s nginxsvc-5f8b7d4f4d-dtrs7 1/1 Running 0 2d16h pv-pod 1/1 Running 0 3d10h sleepy 1/1 Running 87 (33m ago) 4d2h testpod 1/1 Running 0 4d16h two-containers 2/2 Running 518 (2m12s ago) 3d23h web-0 1/1 Running 0 4d4h web-1 1/1 Running 0 3d15h web-2 1/1 Running 0 3d15h webserver-76d44586d-8gqhf 1/1 Running 0 2d23h webshop-7f9fd49d4c-92nj2 1/1 Running 0 2d18h webshop-7f9fd49d4c-kqllw 1/1 Running 0 2d18h webshop-7f9fd49d4c-x2czc 1/1 Running 0 2d18h [root@k8s cka]# kubectl get all NAME READY STATUS RESTARTS AGE pod/deploydaemon-zzllp 1/1 Running 0 3d15h pod/firstnginx-d8679d567-249g9 1/1 Running 0 4d16h pod/firstnginx-d8679d567-66c4s 1/1 Running 0 4d16h pod/firstnginx-d8679d567-72qbd 1/1 Running 0 4d16h pod/firstnginx-d8679d567-rhhlz 1/1 Running 0 3d23h pod/init-demo 1/1 Running 0 4d1h pod/lab4-pod 1/1 Running 0 2d22h pod/morevol 2/2 Running 166 (8m36s ago) 3d11h pod/mydaemon-d4dcd 1/1 Running 0 3d15h pod/mystaticpod-k8s.example.pl 1/1 Running 0 26h pod/nginx 0/1 Pending 0 32s pod/nginxsvc-5f8b7d4f4d-dtrs7 1/1 Running 0 2d16h pod/pv-pod 1/1 Running 0 3d10h pod/sleepy 1/1 Running 87 (34m ago) 4d2h pod/testpod 1/1 Running 0 4d16h pod/two-containers 2/2 Running 518 (2m36s ago) 3d23h pod/web-0 1/1 Running 0 4d4h pod/web-1 1/1 Running 0 3d15h pod/web-2 1/1 Running 0 3d15h pod/webserver-76d44586d-8gqhf 1/1 Running 0 2d23h pod/webshop-7f9fd49d4c-92nj2 1/1 Running 0 2d18h pod/webshop-7f9fd49d4c-kqllw 1/1 Running 0 2d18h pod/webshop-7f9fd49d4c-x2czc 1/1 Running 0 2d18h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/apples ClusterIP 10.101.6.55 <none> 80/TCP 2d14h service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4d21h service/newdep ClusterIP 10.100.68.120 <none> 8080/TCP 2d15h service/nginx ClusterIP None <none> 80/TCP 4d4h service/nginxsvc ClusterIP 10.104.155.180 <none> 80/TCP 2d16h service/webshop NodePort 10.109.119.90 <none> 80:32064/TCP 2d17h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/deploydaemon 1 1 1 1 1 <none> 3d15h daemonset.apps/mydaemon 1 1 1 1 1 <none> 4d15h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/firstnginx 4/4 4 4 4d16h deployment.apps/nginxsvc 1/1 1 1 2d16h deployment.apps/webserver 1/1 1 1 2d23h deployment.apps/webshop 3/3 3 3 2d18h NAME DESIRED CURRENT READY AGE replicaset.apps/firstnginx-d8679d567 4 4 4 4d16h replicaset.apps/nginxsvc-5f8b7d4f4d 1 1 1 2d16h replicaset.apps/webserver-667ddc69b6 0 0 0 2d23h replicaset.apps/webserver-76d44586d 1 1 1 2d23h replicaset.apps/webshop-7f9fd49d4c 3 3 3 2d18h NAME READY AGE statefulset.apps/web 3/3 4d4h [root@k8s cka]# kubectl describe pod/nginx Name: nginx Namespace: default Priority: 0 Service Account: default Node: <none> Labels: <none> Annotations: <none> Status: Pending IP: IPs: <none> Containers: nginx: Image: nginx Port: <none> Host Port: <none> Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ttksw (ro) Conditions: Type Status PodScheduled False Volumes: kube-api-access-ttksw: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: disktype=ssd Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 76s default-scheduler 0/1 nodes are available: 1 node(s) were unschedulable. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.. [root@k8s cka]# kubectl get pods NAME READY STATUS RESTARTS AGE deploydaemon-zzllp 1/1 Running 0 3d15h firstnginx-d8679d567-249g9 1/1 Running 0 4d16h firstnginx-d8679d567-66c4s 1/1 Running 0 4d16h firstnginx-d8679d567-72qbd 1/1 Running 0 4d16h firstnginx-d8679d567-rhhlz 1/1 Running 0 3d23h init-demo 1/1 Running 0 4d1h lab4-pod 1/1 Running 0 2d22h morevol 2/2 Running 166 (10m ago) 3d11h mydaemon-d4dcd 1/1 Running 0 3d15h mystaticpod-k8s.example.pl 1/1 Running 0 26h nginx 1/1 Running 0 2m31s nginxsvc-5f8b7d4f4d-dtrs7 1/1 Running 0 2d16h pv-pod 1/1 Running 0 3d10h sleepy 1/1 Running 87 (36m ago) 4d2h testpod 1/1 Running 0 4d16h two-containers 2/2 Running 518 (4m35s ago) 3d23h web-0 1/1 Running 0 4d4h web-1 1/1 Running 0 3d15h web-2 1/1 Running 0 3d15h webserver-76d44586d-8gqhf 1/1 Running 0 2d23h webshop-7f9fd49d4c-92nj2 1/1 Running 0 2d18h webshop-7f9fd49d4c-kqllw 1/1 Running 0 2d18h webshop-7f9fd49d4c-x2czc 1/1 Running 0 2d18h |
Affinity and Anti-Affinity
- (Anti-)Affinity is used to define advanced scheduler rules
- Node affinity is used to constrain a node that can receive a Pod by matching labels of these nodes
- Inter-pod affinity constrains nodes to receive Pods by matching labels of existing Pods already running on that node
- Anti-affinity can only be applied between Pods
How it Works
- A Pod that has a node affinity label of key=value will only be scheduled to
nodes with a matching label - A Pod that has a Pod affinity label of key=value will only be scheduled to nodes running Pods with the matching label
Setting Node Affinity
- To define node affinity, two different statements can be used
requiredDuringSchedulinglgnoredDuringExecution
requires the node to meet the constraint that is definedpreferredDuringSchedulinglgnoredDuringExecution
defines a soft affinity that is ignored if it cannot be fulfilled- At the moment, affinity is only applied while scheduling Pods, and cannot be used to change where Pods are already running
Defining Affinity Labels
- Affinity rules go beyond labels that use a
key=value
label - A
matchexpression
is used to define a key (the label), an operator as well as optionally one or more values
1 2 3 4 5 6 7 8 9 10 |
affinity: nodeAffinity: requiredDuringSchedulinglgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: type operator: In values: - blue - green |
- Matches any node that has type set to either blue or green
1 2 3 4 |
nodeSelectorTerms: - matchExpressions: - key: storage operator: Exists |
- Matches any node where the key storage is defined
Examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
[root@k8s cka]# cat pod-with-node-affinity.yaml apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/e2e-az-name operator: In values: - e2e-az1 - e2e-az2 preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: another-node-label-key operator: In values: - another-node-label-value containers: - name: with-node-affinity image: k8s.gcr.io/pause:2.0 [root@k8s cka]# kubectl apply -f pod-with-node-affinity.yaml pod/with-node-affinity created [root@k8s cka]# kubectl get pods NAME READY STATUS RESTARTS AGE ... with-node-affinity 0/1 Pending 0 7s [root@k8s cka]# kubectl delete -f pod-with-node-affinity.yaml pod "with-node-affinity" deleted [root@k8s cka]# cat pod-with-node-antiaffinity.yaml #kubectl label nodes node01 disktype=ssd apiVersion: v1 kind: Pod metadata: name: antinginx spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype operator: NotIn values: - ssd containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent [root@k8s cka]# kubectl apply -f pod-with-node-antiaffinity.yaml pod/antinginx created [root@k8s cka]# kubectl get pods NAME READY STATUS RESTARTS AGE antinginx 0/1 Pending 0 5s deploydaemon-zzllp 1/1 Running 0 3d18h ... [root@k8s cka]# kubectl describe node k8s.example.pl Name: k8s.example.pl Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux disktype=ssd ... [root@k8s cka]# kubectl describe node k8s.example.pl | grep ssd disktype=ssd [root@k8s cka]# kubectl label nodes k8s.example.pl disktype- node/k8s.example.pl unlabeled [root@k8s cka]# kubectl get pods NAME READY STATUS RESTARTS AGE antinginx 1/1 Running 0 34m deploydaemon-zzllp 1/1 Running 0 3d18h ... [root@k8s cka]# cat pod-with-pod-affinity.yaml apiVersion: v1 kind: Pod metadata: name: with-pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: failure-domain.beta.kubernetes.io/zone podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S2 topologyKey: failure-domain.beta.kubernetes.io/zone containers: - name: with-pod-affinity image: k8s.gcr.io/pause:2.0 |
TopologyKey
- When defining Pod affinity and anti-affinity, a toplogyKey property is
required - The topologyKey refers to a label that exists on nodes, and typically has a format containing a slash
kubernetes.io/host
- Using topologyKeys allows the Pods only to be assigned to hosts matching the topologyKey
- This allows administrators to use zones where the workloads are implemented
- If no matching topologyKey is found on the host, the specified topologyKey will be ignored in the affinity
Using Pod Anti-Affinity
kubectl create -f redis-with-pod-affinity.yaml
- On a two-node cluster, one Pod stays in a state of pending
kubectl create -f web-with-pod-affinity.yaml
- This will run web instances only on nodes where redis is running as well
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
[root@k8s cka]# cat redis-with-pod-affinity.yaml apiVersion: apps/v1 kind: Deployment metadata: name: redis-cache spec: selector: matchLabels: app: store replicas: 3 template: metadata: labels: app: store spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - store topologyKey: "kubernetes.io/hostname" containers: - name: redis-server image: redis:3.2-alpine [root@k8s cka]# kubectl create -f redis-with-pod-affinity.yaml deployment.apps/redis-cache created [root@k8s cka]# kubectl get pods NAME READY STATUS RESTARTS AGE antinginx 1/1 Running 0 113m ... redis-cache-8478cbdc86-cfsmz 0/1 Pending 0 6s redis-cache-8478cbdc86-kr8qr 0/1 Pending 0 6s redis-cache-8478cbdc86-w2swz 1/1 Running 0 6s sleepy 1/1 Running 92 (14m ago) 4d6h testpod 1/1 Running 0 4d21h two-containers 2/2 Running 546 (94s ago) 4d3h web-0 1/1 Running 0 4d9h web-1 1/1 Running 0 3d20h web-2 1/1 Running 0 3d20h webserver-76d44586d-8gqhf 1/1 Running 0 3d3h webshop-7f9fd49d4c-92nj2 1/1 Running 0 2d23h webshop-7f9fd49d4c-kqllw 1/1 Running 0 2d23h webshop-7f9fd49d4c-x2czc 1/1 Running 0 2d23h |
The anti affinity rule makes that you’ll never get two of the same applications running on the same node.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
[root@k8s cka]# cat web-with-pod-affinity.yaml apiVersion: apps/v1 kind: Deployment metadata: name: web-server spec: selector: matchLabels: app: web-store replicas: 3 template: metadata: labels: app: web-store spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web-store topologyKey: "kubernetes.io/hostname" podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - store topologyKey: "kubernetes.io/hostname" containers: - name: web-app image: nginx:1.16-alpine [root@k8s cka]# kubectl create -f web-with-pod-affinity.yaml deployment.apps/web-server created [root@k8s cka]# kubectl get pods NAME READY STATUS RESTARTS AGE antinginx 1/1 Running 0 162m ... redis-cache-8478cbdc86-cfsmz 0/1 Pending 0 48m redis-cache-8478cbdc86-kr8qr 0/1 Pending 0 48m redis-cache-8478cbdc86-w2swz 1/1 Running 0 48m ... web-server-55f57c89d4-25qhr 0/1 Pending 0 96s web-server-55f57c89d4-crtfn 1/1 Running 0 96s web-server-55f57c89d4-vl4p5 0/1 Pending 0 96s webserver-76d44586d-8gqhf 1/1 Running 0 3d4h webshop-7f9fd49d4c-92nj2 1/1 Running 0 3d webshop-7f9fd49d4c-kqllw 1/1 Running 0 3d webshop-7f9fd49d4c-x2czc 1/1 Running 0 3d |
That mean that it’ll run web instances only on nodes where redis is running as well.
Taints
- Taints are applied to a node to mark that the node should not accept any
Pod that doesn’t tolerate the taint - Tolerations are applied to Pods and allow (but do not require) Pods to schedule on nodes with matching Taints — so they are an exception to taints that are applied
- Where Affinities are used on Pods to attract them to specific nodes, Taints allow a node to repel a set of Pods
- Taints and Tolerations are used to ensure Pods are not scheduled on inappropriate nodes, and thus make sure that dedicated nodes can beconfigured for dedicated tasks
Taint Types
- Three types of Taint can be applied:
NoSchedule
: does not schedule new PodsPreferNoSchedule
: does not schedule new Pods, unless there is no other optionNoExecute
: migrates all Pods away from this node
- If the Pod has a toleration however, it will ignore the taint
SettingTaints
- Taints are set in different ways
- Control plane nodes automatically get taints that won’t schedule user Pods
- When
kubectl drain
andkubectl cordon
are used, a taint is applied on the target node - Taints can be set automatically by the cluster when critical conditions arise, such as a node running out of disk space
- Administrators can use
kubectl taint
to set taints:kubectl taint nodes worker1 key1=value1:NoSchedule
kubectl taint nodes worker1 key1=value1:NoSchedule-
Tolerations
- To allow a Pod to run on a node with a specific taint, a toleration can be
used - This is essential for running core Kubernetes Pods on the control plane nodes
- While creating taints and tolerations, a key and value are defined to allow for more specific access
kubectl taint nodes worker1 storage=ssd:NoSchedule
- This will allow a Pod to run if it has a toleration containing the key
storage
and the valuessd
Taint Key and Value
- While defining a toleration, the Pod needs a key, operator, and value:
tolerations:
- key: "storage"
operator: "Equal"
value: "ssd"
- The default value for the operator is “Equal”; as an alternative, “Exists” is commonly used
- If the operator “Exists” is used, the key should match the taint key and the value is ignored
- If the operator “Equal” is used, the key and value must match
Node Conditions and Taints
- Node conditions can automatically create taints on nodes if one of the
following applies- memory-pressure
- disk-pressure
- pid-pressure
- unschedulable
- network-unavailable
- If any of these conditions apply, a taint is automatically set
- Node conditions can be ignored by adding corresponding Pod tolerations
Using Taints – commands
kubectl taint nodes worker1 storage=ssd:NoSchedule
kubectl describe nodes worker1
kubectl create deployment nginx-taint --image=nginx
kubectl scale deployment nginx-taint —replicas=3
kubectl get pods —o wide
# will show that pods are all on worker2kubectl create —f taint-toleration.yaml
# will runkubectl create -f taint-toleration2.yaml
# will not run
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
[root@k8s cka]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s.example.pl Ready control-plane 5d17h v1.28.3 [root@k8s cka]# kubectl taint nodes k8s.example.pl storage=ssd:NoSchedule node/k8s.example.pl tainted [root@k8s cka]# kubectl describe node k8s.example.pl | grep Taints Taints: storage=ssd:NoSchedule [root@k8s cka]# kubectl create deploy nginx-taint --image=nginx deployment.apps/nginx-taint created [root@k8s cka]# kubectl scale deploy nginx-taint --replicas=3 deployment.apps/nginx-taint scaled [root@k8s cka]# kubectl get pods --selector app=nginx-taint NAME READY STATUS RESTARTS AGE nginx-taint-68bd5db674-7skqs 0/1 Pending 0 2m2s nginx-taint-68bd5db674-vjq89 0/1 Pending 0 2m2s nginx-taint-68bd5db674-vqz2z 0/1 Pending 0 2m38s |
All the nginx-taint pods don’t run because the taint is set on control node and there is only one node. Control node will only allow nodes that have this storage in ssd. Let’s create a toleration.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
[root@k8s cka]# kubectl describe node k8s.example.pl | grep Taints Taints: storage=ssd:NoSchedule [root@k8s cka]# cat taint-toleration.yaml apiVersion: v1 kind: Pod metadata: name: nginx-ssd labels: env: test spec: containers: - name: nginx-ssd image: nginx imagePullPolicy: IfNotPresent tolerations: - key: "storage" operator: "Equal" value: "ssd" effect: "NoSchedule" [root@k8s cka]# kubectl apply -f taint-toleration.yaml pod/nginx-ssd created [root@k8s cka]# kubectl get pods nginx-ssd -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ssd 1/1 Running 0 24s 10.244.0.53 k8s.netico.pl <none> <none> [root@k8s cka]# cat taint-toleration2.yaml apiVersion: v1 kind: Pod metadata: name: nginx-hdd labels: env: test spec: containers: - name: nginx-hdd image: nginx imagePullPolicy: IfNotPresent tolerations: - key: "storage" operator: "Equal" value: "hdd" effect: "NoSchedule" [root@k8s cka]# kubectl apply -f taint-toleration2.yaml pod/nginx-hdd created [root@k8s cka]# kubectl get pods nginx-hdd -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-hdd 0/1 Pending 0 11s <none> <none> <none> <none> |
The nginx-ssd has configured tolerations for storage=ssd:NoSchedule
taint and is running on contrl node. The nginx-hdd has confogured only tolerations for storage=hdd:NoSchedule
so it’s not running on node.
LimitRange
- LimitRange is an API object that limits resource usage per container or Pod
in a Namespace - It uses three relevant options:
type:
specifies whether it applies to Pods or containersdefaultRequest:
the default resources the application will requestdefault:
the maximum resources the application can use
Quota
- Quota is an API object that limits total resources available in a Namespace
- If a Namespace is configured with Quota, applications in that Namespace must be configured with resource settings in
pod.spec.containers.resources
- Where the goal of the LimitRange is to set default restrictions for each application running in a Namespace, the goal of Quota is to define maximum resources that can be consumed within a Namespace by all applications
Managing Quota
kubectl create quota qtest --hard pods=3,cpu=100m,memory=500Mi
--namespace limited
kubectl describe quota --namespace limited
kubectl create deploy nginx --image=nginx:latest --replicas=3 -n limited
kubectl get all -n limited
# no podskubectl describe rs/nginx-xxx -n limited
# it fails because no quota have been set on the deploymentkubectl set resources deploy nginx --requests cpu=100m,memory=5Mii --limits cou=200m,memory=20Mi -n limited
kubectl get pods -n limited
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
[root@k8s cka]# kubectl create ns limited namespace/limited created [root@k8s cka]# kubectl create quota qtest --hard pods=3,cpu=100m,memory=500Mi --namespace limited resourcequota/qtest created [root@k8s cka]# kubectl describe quota --namespace limited Name: qtest Namespace: limited Resource Used Hard -------- ---- ---- cpu 0 100m memory 0 500Mi pods 0 3 [root@k8s cka]# kubectl describe quota -n limited Name: qtest Namespace: limited Resource Used Hard -------- ---- ---- cpu 0 100m memory 0 500Mi pods 0 3 [root@k8s cka]# kubectl create deploy nginx --image=nginx:latest --replicas=3 -n limited deployment.apps/nginx created [root@k8s cka]# kubectl get all -n limited NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nginx 0/3 0 0 17s NAME DESIRED CURRENT READY AGE replicaset.apps/nginx-56fcf95486 3 0 0 17s [root@k8s cka]# kubectl describe -n limited replicaset.apps/nginx-56fcf95486 Name: nginx-56fcf95486 Namespace: limited Selector: app=nginx,pod-template-hash=56fcf95486 Labels: app=nginx pod-template-hash=56fcf95486 Annotations: deployment.kubernetes.io/desired-replicas: 3 deployment.kubernetes.io/max-replicas: 4 deployment.kubernetes.io/revision: 1 Controlled By: Deployment/nginx Replicas: 0 current / 3 desired Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=nginx pod-template-hash=56fcf95486 Containers: nginx: Image: nginx:latest Port: <none> Host Port: <none> Environment: <none> Mounts: <none> Volumes: <none> Conditions: Type Status Reason ---- ------ ------ ReplicaFailure True FailedCreate Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate 83s replicaset-controller Error creating: pods "nginx-56fcf95486-6457s" is forbidden: failed quota: qtest: must specify cpu for: nginx; memory for: nginx Warning FailedCreate 83s replicaset-controller Error creating: pods "nginx-56fcf95486-8pr6v" is forbidden: failed quota: qtest: must specify cpu for: nginx; memory for: nginx Warning FailedCreate 83s replicaset-controller Error creating: pods "nginx-56fcf95486-szt9c" is forbidden: failed quota: qtest: must specify cpu for: nginx; memory for: nginx Warning FailedCreate 83s replicaset-controller Error creating: pods "nginx-56fcf95486-lr5qn" is forbidden: failed quota: qtest: must specify cpu for: nginx; memory for: nginx Warning FailedCreate 83s replicaset-controller Error creating: pods "nginx-56fcf95486-pgt4r" is forbidden: failed quota: qtest: must specify cpu for: nginx; memory for: nginx Warning FailedCreate 83s replicaset-controller Error creating: pods "nginx-56fcf95486-8dvpm" is forbidden: failed quota: qtest: must specify cpu for: nginx; memory for: nginx Warning FailedCreate 82s replicaset-controller Error creating: pods "nginx-56fcf95486-lwk76" is forbidden: failed quota: qtest: must specify cpu for: nginx; memory for: nginx Warning FailedCreate 82s replicaset-controller Error creating: pods "nginx-56fcf95486-n84vk" is forbidden: failed quota: qtest: must specify cpu for: nginx; memory for: nginx Warning FailedCreate 81s replicaset-controller Error creating: pods "nginx-56fcf95486-mt69h" is forbidden: failed quota: qtest: must specify cpu for: nginx; memory for: nginx Warning FailedCreate 1s (x6 over 80s) replicaset-controller (combined from similar events): Error creating: pods "nginx-56fcf95486-mcfxv" is forbidden: failed quota: qtest: must specify cpu for: nginx; memory for: nginx |
It doesn’t work because no resource limitations have been set on a deployment. It can easily be done using kubectl set resources
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
[root@k8s cka]# kubectl set resources -h [root@k8s cka]# kubectl set resources deploy nginx --requests cpu=100m,memory=5Mi --limits cpu=200m,memory=20Mi -n limited deployment.apps/nginx resource requirements updated [root@k8s cka]# kubectl get all -n limited NAME READY STATUS RESTARTS AGE pod/nginx-77d7cdd4d-p5dhh 0/1 Pending 0 32s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nginx 0/3 1 0 8m47s NAME DESIRED CURRENT READY AGE replicaset.apps/nginx-56fcf95486 3 0 0 8m47s replicaset.apps/nginx-77d7cdd4d 1 1 0 32s [root@k8s cka]# kubectl get pods -n limited NAME READY STATUS RESTARTS AGE nginx-77d7cdd4d-p5dhh 0/1 Pending 0 55s [root@k8s cka]# kubectl taint nodes k8s.netico.pl storage- node/k8s.netico.pl untainted [root@k8s cka]# kubectl get pods -n limited NAME READY STATUS RESTARTS AGE nginx-77d7cdd4d-p5dhh 0/1 ContainerCreating 0 117s [root@k8s cka]# kubectl get pods -n limited NAME READY STATUS RESTARTS AGE nginx-77d7cdd4d-p5dhh 1/1 Running 0 2m15s [root@k8s cka]# kubectl describe quota -n limited Name: qtest Namespace: limited Resource Used Hard -------- ---- ---- cpu 100m 100m memory 5Mi 500Mi pods 1 3 |
Only one pod is running because of quota. We can edit quota and set 1000m for spec hard instead 100m.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
[root@k8s cka]# kubectl edit quota -n limited apiVersion: v1 kind: ResourceQuota metadata: creationTimestamp: "2024-02-06T12:50:47Z" name: qtest namespace: limited resourceVersion: "405684" uid: 88b14c91-7097-48f9-a71e-0b159ad49916 spec: hard: cpu: 1000m memory: 500Mi pods: "3" status: hard: cpu: 100m memory: 500Mi pods: "3" used: cpu: 100m memory: 5Mi pods: "1" [root@k8s cka]# kubectl describe quota -n limited Name: qtest Namespace: limited Resource Used Hard -------- ---- ---- cpu 300m 1 memory 15Mi 500Mi pods 3 3 |
Now we can see that the three pods have been scheduled.
Defining Limitrange
kubectl explain limitrange.spec
kubectl create ns limited
kubectl apply -f limitrange.yaml -n limited
kubectl describe ns limited
kubectl run limitpod --image=nginx -n limited
kubectl describe pod limitpod -n limited
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
[root@k8s cka]# kubectl explain -h [root@k8s cka]# kubectl explain limitrange.spec.limits [root@k8s cka]# kubectl delete ns limited namespace "limited" deleted [root@k8s cka]# kubectl create ns limited namespace/limited created [root@k8s cka]# cat limitrange.yaml apiVersion: v1 kind: LimitRange metadata: name: mem-limit-range spec: limits: - default: memory: 512Mi defaultRequest: memory: 256Mi type: Container [root@k8s cka]# kubectl apply -f limitrange.yaml -n limited limitrange/mem-limit-range created [root@k8s cka]# kubectl describe ns limited Name: limited Labels: kubernetes.io/metadata.name=limited Annotations: <none> Status: Active No resource quota. Resource Limits Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio ---- -------- --- --- --------------- ------------- ----------------------- Container memory - - 256Mi 512Mi - [root@k8s cka]# cat limitedpod.yaml apiVersion: v1 kind: Pod metadata: name: limitedpod spec: containers: - name: demo image: registry.k8s.io/pause:2.0 resources: requests: cpu: 700m limits: cpu: 700m [root@k8s cka]# kubectl run limited --image=nginx -n limited pod/limited created [root@k8s cka]# kubectl describe ns limited Name: limited Labels: kubernetes.io/metadata.name=limited Annotations: <none> Status: Active No resource quota. Resource Limits Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio ---- -------- --- --- --------------- ------------- ----------------------- Container memory - - 256Mi 512Mi - [root@k8s cka]# kubectl describe pod limited -n limited Name: limited Namespace: limited Priority: 0 Service Account: default Node: k8s.netico.pl/172.30.9.24 Start Time: Tue, 06 Feb 2024 08:44:37 -0500 Labels: run=limited Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: memory request for container limited; memory limit for container limited Status: Running IP: 10.244.0.61 IPs: IP: 10.244.0.61 Containers: limited: Container ID: docker://a7f1549fb345c6200b35c5a9880fac91863b887df85a043f1c75088d6e75580c Image: nginx Image ID: docker-pullable://nginx@sha256:31754bca89a3afb25c04d6ecfa2d9671bc3972d8f4809ff855f7e35caa580de9 Port: <none> Host Port: <none> State: Running Started: Tue, 06 Feb 2024 08:44:39 -0500 Ready: True Restart Count: 0 Limits: memory: 512Mi Requests: memory: 256Mi Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vtc92 (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-vtc92: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 3m45s default-scheduler Successfully assigned limited/limited to k8s.netico.pl Normal Pulling 3m45s kubelet Pulling image "nginx" Normal Pulled 3m43s kubelet Successfully pulled image "nginx" in 1.258s (1.258s including waiting) Normal Created 3m43s kubelet Created container limited Normal Started 3m43s kubelet Started container limited |
Lab: Configuring Taints
- Create a taint on node worker2, that doesn’t allow new Pods to be
scheduled that don’t have an SSD hard disk, unless they have the
appropriate toleration set - Remove the taint after verifying that it works
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
[root@k8s cka]# kubectl describe node k8s.example.pl | grep Taint Taints: storage=ssd:NoSchedule [root@k8s cka]# kubectl create deploy newtaint --image=nginx replicas=3 error: exactly one NAME is required, got 2 See 'kubectl create deployment -h' for help and examples [root@k8s cka]# kubectl create deploy newtaint --image=nginx --replicas=3 deployment.apps/newtaint created [root@k8s cka]# kubectl get all --selector app=newtaint NAME READY STATUS RESTARTS AGE pod/newtaint-85fc66d575-bjlt5 0/1 Pending 0 26s pod/newtaint-85fc66d575-h9ht7 0/1 Pending 0 26s pod/newtaint-85fc66d575-lqfxm 0/1 Pending 0 26s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/newtaint 0/3 3 0 26s NAME DESIRED CURRENT READY AGE replicaset.apps/newtaint-85fc66d575 3 3 0 26s [root@k8s cka]# [root@k8s cka]# kubectl edit deploy newtaint deployment.apps/newtaint edited |
Add tolerations: in container spec:
1 2 3 4 5 6 7 |
terminationGracePeriodSeconds: 30 tolerations: - key: "storage" operator: "Equal" value: "ssd" effect: "NoSchedule" status: |
And now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
[root@k8s cka]# kubectl get all --selector app=newtaint NAME READY STATUS RESTARTS AGE pod/newtaint-bb94b7647-4bnzq 1/1 Running 0 4m39s pod/newtaint-bb94b7647-cmsrf 1/1 Running 0 4m44s pod/newtaint-bb94b7647-xnx5r 1/1 Running 0 4m41s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/newtaint 3/3 3 3 9m34s NAME DESIRED CURRENT READY AGE replicaset.apps/newtaint-85fc66d575 0 0 0 9m34s replicaset.apps/newtaint-bb94b7647 3 3 3 4m44s [root@k8s cka]# kubectl get all --selector app=newtaint -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/newtaint-bb94b7647-4bnzq 1/1 Running 0 5m1s 10.244.0.64 k8s.example.pl <none> <none> pod/newtaint-bb94b7647-cmsrf 1/1 Running 0 5m6s 10.244.0.62 k8s.example.pl <none> <none> pod/newtaint-bb94b7647-xnx5r 1/1 Running 0 5m3s 10.244.0.63 k8s.example.pl <none> <none> NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR deployment.apps/newtaint 3/3 3 3 9m56s nginx nginx app=newtaint NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR replicaset.apps/newtaint-85fc66d575 0 0 0 9m56s nginx nginx app=newtaint,pod-template-hash=85fc66d575 replicaset.apps/newtaint-bb94b7647 3 3 3 5m6s nginx nginx app=newtaint,pod-template-hash=bb94b7647 [root@k8s cka]# kubectl scale deploy newtaint --replicas=5 deployment.apps/newtaint scaled [root@k8s cka]# kubectl get all --selector app=newtaint -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/newtaint-bb94b7647-4bnzq 1/1 Running 0 6m20s 10.244.0.64 k8s.example.pl <none> <none> pod/newtaint-bb94b7647-cmsrf 1/1 Running 0 6m25s 10.244.0.62 k8s.example.pl <none> <none> pod/newtaint-bb94b7647-cvwn2 1/1 Running 0 7s 10.244.0.66 k8s.example.pl <none> <none> pod/newtaint-bb94b7647-l4brf 1/1 Running 0 7s 10.244.0.65 k8s.example.pl <none> <none> pod/newtaint-bb94b7647-xnx5r 1/1 Running 0 6m22s 10.244.0.63 k8s.example.pl <none> <none> NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR deployment.apps/newtaint 5/5 5 5 11m nginx nginx app=newtaint NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR replicaset.apps/newtaint-85fc66d575 0 0 0 11m nginx nginx app=newtaint,pod-template-hash=85fc66d575 replicaset.apps/newtaint-bb94b7647 5 5 5 6m25s nginx nginx app=newtaint,pod-template-hash=bb94b7647 [root@k8s cka]# kubectl delete deploy newtaint deployment.apps "newtaint" deleted [root@k8s cka]# kubectl taint nodes k8s.example.pl storage- node/k8s.example.pl untainted |