Pod scheduling on Openshift – miro.borodziuk.eu

Understanding the Scheduler

3 types of rules are applied for Pod scheduling
- Node labels
- Affinity rules
- Anti-affinity rules
The Pod Scheduler works through 3 steps:
- Node filter: this is where node availability, but also selectors, taints, resource availability and more is evaluated
- Node prioritizing: based on affinity rules, nodes are prioritized
- Select the best nodes: the best scoring node is used, and if multiple nodes apply, round robin is used to select one of them
When used on cloud, scheduling by default happens within the boundaries of a region

Using Node Labels to Control Pod Placement

Nodes can be configured with a label
A label is an arbitrary key-value pair that is set with oc label
Pods can next be configured with a nodeSelector property on the Pod so that they’ll only run on the node that has the right label

Applying Labels to Nodes

A label is an arbitrary key-value pair that can be used as a selector for Pod placement
Use oc label node workerl.example.com env=test to set the label
Use oc label node workerl.example.com env=prod --overwrite to overwrite
Use oc label node workerl.example.com env- to remove the label
Use oc get ... --show-labels to show labels set on any type of resource

Applying Labels to Machine Sets

A machine set is a group of machines that is created when installing OpenShift using full stack automation
Machine sets can be labeled so that nodes generated from the machine set will automatically get a label
To see which nodes are in which machine set, use oc get machines -n openshift-machine-api -o wide
Use oc edit machineset … to set a label in the machine set spec.metadata
Notice that nodes that were already generated from the machine set will not be updated with the new label

Configuring NodeSelector on Pods

Infrastructure-related Pods in an OpenShift cluster are configured to run on a controller node
Use nodeSelector on the Deployment or DeploymentConfig to configure its Pods to run on a node that has a specific label
Use oc edit to apply nodeSelector to existing Deployments or DeploymentConfigs
If a Deployment is configured with a nodeSelector that doesn’t match any node label, the Pods will show as pending
Fix this by setting Deployment spec.template.spec.nodeSelector to the desired key-value pair

Configuring NodeSelector on Projects

nodeSelector can also be set on a project such that resources created in the deployment are automatically placed on the right nodes: oc adm new-project test --node-selector "env=test"
To configure a default nodeSelector on an existing project, add an annotation to its underlying namespace resource: oc annotate namespace test openshift.ianode-selector="test" --overwrite

Using NodeSelector

oc login -u developer -p password
oc create deployment simple --image=bitnami/nginx:latest
oc get all
oc scale --replicas 4 deployment/simple
oc get pods -o wide
oc login -u admin -p password
oc get nodes -L env
oc label node NODE_NAME env=dev

As normal user::

$ oc login -u developer -p developer
Login successful.

You have access to the following projects and can switch between them with 'oc project <projectname>':

    debug
    myproject
  * network-security

Using project "network-security".
[root@okd ~]# oc new-project nodesel
Now using project "nodesel" on server "https://172.30.9.22:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git

to build a new example application in Ruby.

$  oc create deployment simple --image=bitnami/nginx:latest
deployment.apps/simple created

$ oc get all
NAME                          READY     STATUS    RESTARTS   AGE
pod/simple-776bd789d8-zmldg   1/1       Running   0          5s

NAME                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/simple   1         1         1            1           5s

NAME                                DESIRED   CURRENT   READY     AGE
replicaset.apps/simple-776bd789d8   1         1         1         5s

$ oc scale --replicas 4 deployment/simple
deployment.extensions/simple scaled

$ oc get pods -o wide
NAME                      READY     STATUS    RESTARTS   AGE       IP            NODE        NOMINATED NODE
simple-776bd789d8-26zqb   1/1       Running   0          8s        172.17.0.25   localhost   <none>
simple-776bd789d8-nphph   1/1       Running   0          8s        172.17.0.24   localhost   <none>
simple-776bd789d8-tjltf   1/1       Running   0          8s        172.17.0.26   localhost   <none>
simple-776bd789d8-zmldg   1/1       Running   0          35s       172.17.0.22   localhost   <none>

$ oc login -u developer -p developer

You have access to the following projects and can switch between them with 'oc project <projectname>':

debug

myproject

* network-security

Using project "network-security".

[root@okd ~]# oc new-project nodesel

Now using project "nodesel" on server "https://172.30.9.22:8443".

You can add applications to this project with the 'new-app' command. For example, try:

oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git

to build a new example application in Ruby.

$ oc create deployment simple --image=bitnami/nginx:latest

deployment.apps/simple created

$ oc get all

NAME READY STATUS RESTARTS AGE

pod/simple-776bd789d8-zmldg 1/1 Running 0 5s

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE

deployment.apps/simple 1 1 1 1 5s

NAME DESIRED CURRENT READY AGE

replicaset.apps/simple-776bd789d8 1 1 1 5s

$ oc scale --replicas 4 deployment/simple

deployment.extensions/simple scaled

$ oc get pods -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE

simple-776bd789d8-26zqb 1/1 Running 0 8s 172.17.0.25 localhost <none>

simple-776bd789d8-nphph 1/1 Running 0 8s 172.17.0.24 localhost <none>

simple-776bd789d8-tjltf 1/1 Running 0 8s 172.17.0.26 localhost <none>

simple-776bd789d8-zmldg 1/1 Running 0 35s 172.17.0.22 localhost <none>

As admin:

# oc login -u system:admin
Logged into "https://172.30.9.22:8443" as "system:admin" using existing credentials.

You have access to the following projects and can switch between them with 'oc project <projectname>':

default
* nodesel

Using project "nodesel".

$ oc get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
localhost Ready <none> 4d v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=localhost

$ oc label node localhost env=dev
node/localhost labeled

# oc login -u system:admin

Logged into "https://172.30.9.22:8443" as "system:admin" using existing credentials.

You have access to the following projects and can switch between them with 'oc project <projectname>':

default

* nodesel

Using project "nodesel".

$ oc get nodes --show-labels

NAME STATUS ROLES AGE VERSION LABELS

localhost Ready <none> 4d v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=localhost

$ oc label node localhost env=dev

node/localhost labeled

Back to the developer user:

$ oc login -u developer -p developer
Login successful.

You have access to the following projects and can switch between them with 'oc project <projectname>':

    debug
    myproject
    network-security
  * nodesel

Using project "nodesel".

$ oc edit deployment/simple

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: 2023-07-27T14:19:39Z
  generation: 2
  labels:
    app: simple
  name: simple
  namespace: nodesel
  resourceVersion: "1679815"
  selfLink: /apis/extensions/v1beta1/namespaces/nodesel/deployments/simple
  uid: 9f3a67e2-2c88-11ee-8f96-8e5760356a66
spec:
  progressDeadlineSeconds: 600
  replicas: 4
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: simple
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: simple
    spec:
      containers:
      - image: bitnami/nginx:latest
        imagePullPolicy: Always
        name: nginx
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 4
  conditions:
  - lastTransitionTime: 2023-07-27T14:19:39Z
    lastUpdateTime: 2023-07-27T14:19:42Z
    message: ReplicaSet "simple-776bd789d8" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: 2023-07-27T14:20:11Z
    lastUpdateTime: 2023-07-27T14:20:11Z
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 2
  readyReplicas: 4
  replicas: 4
  updatedReplicas: 4

$ oc login -u developer -p developer

You have access to the following projects and can switch between them with 'oc project <projectname>':

debug

myproject

network-security

* nodesel

Using project "nodesel".

$ oc edit deployment/simple

apiVersion: extensions/v1beta1

kind: Deployment

metadata:

annotations:

deployment.kubernetes.io/revision: "1"

creationTimestamp: 2023-07-27T14:19:39Z

generation: 2

labels:

app: simple

namespace: nodesel

resourceVersion: "1679815"

selfLink: /apis/extensions/v1beta1/namespaces/nodesel/deployments/simple

uid: 9f3a67e2-2c88-11ee-8f96-8e5760356a66

spec:

progressDeadlineSeconds: 600

replicas: 4

revisionHistoryLimit: 10

selector:

matchLabels:

app: simple

strategy:

rollingUpdate:

maxSurge: 25%

maxUnavailable: 25%

type: RollingUpdate

template:

metadata:

creationTimestamp: null

labels:

app: simple

spec:

containers:

- image: bitnami/nginx:latest

imagePullPolicy: Always

resources: {}

terminationMessagePath: /dev/termination-log

terminationMessagePolicy: File

dnsPolicy: ClusterFirst

restartPolicy: Always

schedulerName: default-scheduler

securityContext: {}

terminationGracePeriodSeconds: 30

status:

availableReplicas: 4

conditions:

- lastTransitionTime: 2023-07-27T14:19:39Z

lastUpdateTime: 2023-07-27T14:19:42Z

message: ReplicaSet "simple-776bd789d8" has successfully progressed.

reason: NewReplicaSetAvailable

status: "True"

type: Progressing

- lastTransitionTime: 2023-07-27T14:20:11Z

lastUpdateTime: 2023-07-27T14:20:11Z

message: Deployment has minimum availability.

reason: MinimumReplicasAvailable

status: "True"

type: Available

observedGeneration: 2

readyReplicas: 4

replicas: 4

updatedReplicas: 4

Edit the deployment/simple by adding after dnsPolicy:

dnsPolicy:
nodeSelector:
  env: blah

dnsPolicy:

nodeSelector:

env: blah

And then:

$ oc get all
NAME                          READY     STATUS    RESTARTS   AGE
pod/simple-776bd789d8-26zqb   1/1       Running   0          1h
pod/simple-776bd789d8-nphph   1/1       Running   0          1h
pod/simple-776bd789d8-zmldg   1/1       Running   0          1h
pod/simple-77bd5f84cf-dhdch   0/1       Pending   0          16m
pod/simple-77bd5f84cf-hlzdv   0/1       Pending   0          16m


NAME                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/simple   4         5         2            3           47m

NAME                                DESIRED   CURRENT   READY     AGE
replicaset.apps/simple-776bd789d8   3         3         3         47m
replicaset.apps/simple-77bd5f84cf   2         2         0         6m

$ oc edit deployment.apps/simple
deployment.apps/simple edited

$ oc get all

NAME READY STATUS RESTARTS AGE

pod/simple-776bd789d8-26zqb 1/1 Running 0 1h

pod/simple-776bd789d8-nphph 1/1 Running 0 1h

pod/simple-776bd789d8-zmldg 1/1 Running 0 1h

pod/simple-77bd5f84cf-dhdch 0/1 Pending 0 16m

pod/simple-77bd5f84cf-hlzdv 0/1 Pending 0 16m

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE

deployment.apps/simple 4 5 2 3 47m

NAME DESIRED CURRENT READY AGE

replicaset.apps/simple-776bd789d8 3 3 3 47m

replicaset.apps/simple-77bd5f84cf 2 2 0 6m

$ oc edit deployment.apps/simple

deployment.apps/simple edited

Let’s describe the pod which is not runing due to bad nodeSelector=blah:

$ oc get pods
NAME                      READY     STATUS    RESTARTS   AGE
simple-776bd789d8-26zqb   1/1       Running   0          1h
simple-776bd789d8-nphph   1/1       Running   0          1h
simple-776bd789d8-zmldg   1/1       Running   0          1h
simple-77bd5f84cf-dhdch   0/1       Pending   0          16m
simple-77bd5f84cf-hlzdv   0/1       Pending   0          16m

$ oc describe  pod simple-77bd5f84cf-hlzdv
Name:               simple-77bd5f84cf-hlzdv
Namespace:          nodesel
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             app=simple
                    pod-template-hash=3368194079
Annotations:        openshift.io/scc=restricted
Status:             Pending
IP:
Controlled By:      ReplicaSet/simple-77bd5f84cf
Containers:
  nginx:
    Image:        bitnami/nginx:latest
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-pnms6 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  default-token-pnms6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-pnms6
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  env=blah
Tolerations:     <none>
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  2m (x93 over 17m)  default-scheduler  0/1 nodes are available: 1 node(s) didn't match node selector.

$ oc get pods

NAME READY STATUS RESTARTS AGE

simple-776bd789d8-26zqb 1/1 Running 0 1h

simple-776bd789d8-nphph 1/1 Running 0 1h

simple-776bd789d8-zmldg 1/1 Running 0 1h

simple-77bd5f84cf-dhdch 0/1 Pending 0 16m

simple-77bd5f84cf-hlzdv 0/1 Pending 0 16m

$ oc describe pod simple-77bd5f84cf-hlzdv

Name: simple-77bd5f84cf-hlzdv

Namespace: nodesel

Priority: 0

PriorityClassName: <none>

Node: <none>

Labels: app=simple

pod-template-hash=3368194079

Annotations: openshift.io/scc=restricted

Status: Pending

IP:

Controlled By: ReplicaSet/simple-77bd5f84cf

Containers:

nginx:

Image: bitnami/nginx:latest

Port: <none>

Host Port: <none>

Environment: <none>

Mounts:

/var/run/secrets/kubernetes.io/serviceaccount from default-token-pnms6 (ro)

Conditions:

Type Status

PodScheduled False

Volumes:

default-token-pnms6:

Type: Secret (a volume populated by a Secret)

SecretName: default-token-pnms6

Optional: false

QoS Class: BestEffort

Node-Selectors: env=blah

Tolerations: <none>

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Warning FailedScheduling 2m (x93 over 17m) default-scheduler 0/1 nodes are available: 1 node(s) didn't match node selector.

Now change again nodeSelector in deployment/simple :

nodeSelector:
  env: dev

1 2	nodeSelector: env: dev

After the changes all pods are running again:

$ oc get all
NAME                          READY     STATUS    RESTARTS   AGE
pod/simple-6f55965d79-5d59d   1/1       Running   0          22s
pod/simple-6f55965d79-5dt56   1/1       Running   0          21s
pod/simple-6f55965d79-mklpc   1/1       Running   0          25s
pod/simple-6f55965d79-q8pq9   1/1       Running   0          25s

NAME                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/simple   4         4         4            4           3h

NAME                                DESIRED   CURRENT   READY     AGE
replicaset.apps/simple-57f7866b4b   0         0         0         2h
replicaset.apps/simple-6f55965d79   4         4         4         25s
replicaset.apps/simple-776bd789d8   0         0         0         3h
replicaset.apps/simple-77bd5f84cf   0         0         0         2h
replicaset.apps/simple-8559698ddc   0         0         0         1h

$ oc get all

NAME READY STATUS RESTARTS AGE

pod/simple-6f55965d79-5d59d 1/1 Running 0 22s

pod/simple-6f55965d79-5dt56 1/1 Running 0 21s

pod/simple-6f55965d79-mklpc 1/1 Running 0 25s

pod/simple-6f55965d79-q8pq9 1/1 Running 0 25s

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE

deployment.apps/simple 4 4 4 4 3h

NAME DESIRED CURRENT READY AGE

replicaset.apps/simple-57f7866b4b 0 0 0 2h

replicaset.apps/simple-6f55965d79 4 4 4 25s

replicaset.apps/simple-776bd789d8 0 0 0 3h

replicaset.apps/simple-77bd5f84cf 0 0 0 2h

replicaset.apps/simple-8559698ddc 0 0 0 1h

And now we must remove the label which previously was added::

$ oc login -u system:admin
Logged into "https://172.30.9.22:8443" as "system:admin" using existing credentials.

You have access to the following projects and can switch between them with 'oc project <projectname>':

  * default
    nodesel
Using project "default".

$ oc project nodesel
Now using project "nodesel" on server "https://172.30.9.22:8443".

$ oc get nodes --show-labels
NAME        STATUS    ROLES     AGE       VERSION           LABELS
localhost   Ready     <none>    4d        v1.11.0+d4cacc0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,env=dev,kubernetes.io/hostname=localhost

$ oc label node localhost env-
node/localhost labeled

$ oc get nodes --show-labels
NAME        STATUS    ROLES     AGE       VERSION           LABELS
localhost   Ready     <none>    4d        v1.11.0+d4cacc0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=localhost

$ oc login -u system:admin

Logged into "https://172.30.9.22:8443" as "system:admin" using existing credentials.

You have access to the following projects and can switch between them with 'oc project <projectname>':

* default

nodesel

Using project "default".

$ oc project nodesel

Now using project "nodesel" on server "https://172.30.9.22:8443".

$ oc get nodes --show-labels

NAME STATUS ROLES AGE VERSION LABELS

localhost Ready <none> 4d v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,env=dev,kubernetes.io/hostname=localhost

$ oc label node localhost env-

node/localhost labeled

$ oc get nodes --show-labels

NAME STATUS ROLES AGE VERSION LABELS

localhost Ready <none> 4d v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=localhost

Deployment or DeploymentConfig

Deployment is the Kubernetes resource, DeploymentConfig is the OpenShift resource
It doesn’t matter which one is used
DeploymentConfig is created when working with the console
Deployment is the standard, but when using oc new-app --as-deployment-config it will create a DeploymentConfig instead

Pod Affinity

[Anti-]affinity defines relations between Pods
podAffinity is a Pod property that tells the scheduler to locate a new Pod on the same node as other Pods
podAntiAffinity tells the scheduler not to locate a new Pod on the same node as other Pods
nodeAffinity tells a Pod (not) to schedule on nodes with specific labels
[Anti]-affinity is applied based on Pod labels
Required affinity rules must be met before a Pod can be scheduled on a node
Preferred rules are not guaranteed

matchExpressions

In affinity rules, a matchExpression is used on the key-value specification that matches the label
In this matchExpression the operator can have the following values
- NotIn
- Exists
- DoesNotExist
- Lt
- Gt

Node Affinity

Node affinity can be used to only run a Pod on a node that meets specific requirements
Node affinity, like Pod affinity, works with labels that are set on the node
Required rules must be met
Preferred rules should be met

Using nodeAffinity

oc login -u admin -p password
oc create -f nodeaffinity.yaml
oc describe pod runonssd
oc label node crc-[Tab] disktype=nvme
oc describe pod runonssd

$ cat anti-affinity.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: anti1
  labels:
    love: ihateyou
spec:
  containers:
  - name: ocp
    image: docker.io/ocpqe/hello-pod

---
apiVersion: v1
kind: Pod
metadata:
  name: anti2
  labels:
    love: ihateyou
spec:
  containers:
  - name: ocp
    image: docker.io/ocpqe/hello-pod
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: love
            operator: In
            values:
            - ihateyou
        topologyKey: kubernetes.io/hostname

$ oc new-project love
Now using project "love" on server "https://172.30.9.22:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git

to build a new example application in Ruby.

$ oc create -f anti-affinity.yaml
pod/anti1 created
pod/anti2 created

$ oc get pods
NAME      READY     STATUS    RESTARTS   AGE
anti1     1/1       Running   0          4s
anti2     0/1       Pending   0          4s

$ oc describe pod anti2
Name:               anti2
Namespace:          love
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             love=ihateyou
Annotations:        openshift.io/scc=anyuid
Status:             Pending
IP:
Containers:
  ocp:
    Image:        docker.io/ocpqe/hello-pod
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bcpgf (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  default-token-bcpgf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-bcpgf
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  1s (x4 over 15s)  default-scheduler  0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules.

$ oc get pods
NAME      READY     STATUS    RESTARTS   AGE
anti1     1/1       Running   0          37s
anti2     0/1       Pending   0          37s

$ cat anti-affinity.yaml

---

apiVersion: v1

kind: Pod

metadata:

labels:

love: ihateyou

spec:

containers:

- name: ocp

image: docker.io/ocpqe/hello-pod

---

apiVersion: v1

kind: Pod

metadata:

labels:

love: ihateyou

spec:

containers:

- name: ocp

image: docker.io/ocpqe/hello-pod

affinity:

podAntiAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

- labelSelector:

matchExpressions:

- key: love

operator: In

values:

- ihateyou

topologyKey: kubernetes.io/hostname

$ oc new-project love

Now using project "love" on server "https://172.30.9.22:8443".

You can add applications to this project with the 'new-app' command. For example, try:

oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git

to build a new example application in Ruby.

$ oc create -f anti-affinity.yaml

pod/anti1 created

pod/anti2 created

$ oc get pods

NAME READY STATUS RESTARTS AGE

anti1 1/1 Running 0 4s

anti2 0/1 Pending 0 4s

$ oc describe pod anti2

Name: anti2

Namespace: love

Priority: 0

PriorityClassName: <none>

Node: <none>

Labels: love=ihateyou

Annotations: openshift.io/scc=anyuid

Status: Pending

IP:

Containers:

ocp:

Image: docker.io/ocpqe/hello-pod

Port: <none>

Host Port: <none>

Environment: <none>

Mounts:

/var/run/secrets/kubernetes.io/serviceaccount from default-token-bcpgf (ro)

Conditions:

Type Status

PodScheduled False

Volumes:

default-token-bcpgf:

Type: Secret (a volume populated by a Secret)

SecretName: default-token-bcpgf

Optional: false

QoS Class: BestEffort

Node-Selectors: <none>

Tolerations: <none>

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Warning FailedScheduling 1s (x4 over 15s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules.

$ oc get pods

NAME READY STATUS RESTARTS AGE

anti1 1/1 Running 0 37s

anti2 0/1 Pending 0 37s

Using nodeAffinity

oc login -u admin -p password
oc create -f node-affinity.yaml
oc describe pod runonssd
oc label node crc-[Tab] disktype=nvme
oc describe pod runonssd

$ oc whoami system:admin 

$ cat node-affinity*

apiVersion: v1
kind: Pod 
metadata: 
  name: runonssd 
spec: 
  affinity: 
    nodeAffinity: 
      reguiredDuringSchedulingIgnoredDuringExecution: 
        nodeSelectorTerms: 
        - matchExpressions: 
          - key: disktype 
            operator: In 
            values: 
            - ssd 
            - nvme 
  containers: 
  - name: onssd 
    image: docker.io/ocpqe/hello-pod 


$ oc create -f node-affinity.yaml
pod/runonssd created


$ oc get pods
NAME       READY     STATUS    RESTARTS   AGE
anti1      1/1       Running   0          1h
anti2      0/1       Pending   0          1h
runonssd   1/1       Running   0          7s

$ oc describe pod runonssd
Name:               runonssd
Namespace:          love
Priority:           0
PriorityClassName:  <none>
Node:               localhost/172.30.9.22
Start Time:         Sat, 29 Jul 2023 12:46:04 +0200
Labels:             <none>
Annotations:        openshift.io/scc=anyuid
Status:             Running
IP:                 172.17.0.37
Containers:
  onssd:
    Container ID:   docker://f0167be7cd30fdb106fe08c50df644d5a705a65ac1dc2ae0e07f35a4006ce7b4
    Image:          docker.io/ocpqe/hello-pod
    Image ID:       docker-pullable://ocpqe/hello-pod@sha256:04b6af86b03c1836211be2589db870dba09b7811c197c47c07fbbe33c7f80ef7
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Sat, 29 Jul 2023 12:46:06 +0200
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bcpgf (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-bcpgf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-bcpgf
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type    Reason     Age   From                Message
  ----    ------     ----  ----                -------
  Normal  Scheduled  20s   default-scheduler   Successfully assigned love/runonssd to localhost
  Normal  Pulling    19s   kubelet, localhost  pulling image "docker.io/ocpqe/hello-pod"
  Normal  Pulled     18s   kubelet, localhost  Successfully pulled image "docker.io/ocpqe/hello-pod"
  Normal  Created    18s   kubelet, localhost  Created container
  Normal  Started    18s   kubelet, localhost  Started container


$ oc label node localhost disktype=nvme
node/localhost labeled


$ oc get pods
NAME       READY     STATUS    RESTARTS   AGE
anti1      1/1       Running   0          1h
anti2      0/1       Pending   0          1h
runonssd   1/1       Running   0          1m

$ oc whoami system:admin

$ cat node-affinity*

apiVersion: v1

kind: Pod

metadata:

spec:

affinity:

nodeAffinity:

reguiredDuringSchedulingIgnoredDuringExecution:

nodeSelectorTerms:

- matchExpressions:

- key: disktype

operator: In

values:

- ssd

- nvme

containers:

- name: onssd

image: docker.io/ocpqe/hello-pod

$ oc create -f node-affinity.yaml

pod/runonssd created

$ oc get pods

NAME READY STATUS RESTARTS AGE

anti1 1/1 Running 0 1h

anti2 0/1 Pending 0 1h

runonssd 1/1 Running 0 7s

$ oc describe pod runonssd

Name: runonssd

Namespace: love

Priority: 0

PriorityClassName: <none>

Node: localhost/172.30.9.22

Start Time: Sat, 29 Jul 2023 12:46:04 +0200

Labels: <none>

Annotations: openshift.io/scc=anyuid

Status: Running

IP: 172.17.0.37

Containers:

onssd:

Container ID: docker://f0167be7cd30fdb106fe08c50df644d5a705a65ac1dc2ae0e07f35a4006ce7b4

Image: docker.io/ocpqe/hello-pod

Image ID: docker-pullable://ocpqe/hello-pod@sha256:04b6af86b03c1836211be2589db870dba09b7811c197c47c07fbbe33c7f80ef7

Port: <none>

Host Port: <none>

State: Running

Started: Sat, 29 Jul 2023 12:46:06 +0200

Ready: True

Restart Count: 0

Environment: <none>

Mounts:

/var/run/secrets/kubernetes.io/serviceaccount from default-token-bcpgf (ro)

Conditions:

Type Status

Initialized True

Ready True

ContainersReady True

PodScheduled True

Volumes:

default-token-bcpgf:

Type: Secret (a volume populated by a Secret)

SecretName: default-token-bcpgf

Optional: false

QoS Class: BestEffort

Node-Selectors: <none>

Tolerations: <none>

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Scheduled 20s default-scheduler Successfully assigned love/runonssd to localhost

Normal Pulling 19s kubelet, localhost pulling image "docker.io/ocpqe/hello-pod"

Normal Pulled 18s kubelet, localhost Successfully pulled image "docker.io/ocpqe/hello-pod"

Normal Created 18s kubelet, localhost Created container

Normal Started 18s kubelet, localhost Started container

$ oc label node localhost disktype=nvme

node/localhost labeled

$ oc get pods

NAME READY STATUS RESTARTS AGE

anti1 1/1 Running 0 1h

anti2 0/1 Pending 0 1h

runonssd 1/1 Running 0 1m

Taints and Tolerations

A taint allows a node to refuse a Pod unless the Pod has a matching toleration
taints are applied to nodes through the node spec
tolerations are applied to a Pod through the Pod spec
Taints and tolerations consist of a key, a value, and an effect
The effect is one of the following:
- NoSchedule: new Pods will not be scheduled
- PreferNoSchedule: the scheduler tries to avoid scheduling new Pods
- NoExecute: new Pods won’t be scheduled and existing Pods will be removed
- All effects are only applied if no toleration exists on the Pods
Use tolerationSeconds to specify how long it takes before Pods are evicted when NoExecute is set

Managing Taints (Fails on CRC!)

oc login -u admin -p password
oc adm taint nodes crc-[Tab] keyl=valuel:NoSchedule
oc run newpod --image=bitnami/nginx
oc get pods
oc describe pods mypod
oc edit pod mypod

spec:

tolerations:

- key: key1

value: value1

operator: Equal

effect: NoExecute

oc adm taint nodes crc-[Tab] keyl-

$ oc whoami
system:admin
[root@okd ex280]# oc adm taint nodes localhost keyl=valuel:NoSchedule
node/localhost tainted

$ oc run newpod --image=bitnami/nginx
deploymentconfig.apps.openshift.io/newpod created

$ oc get pods
NAME              READY     STATUS    RESTARTS   AGE
anti1             1/1       Running   0          1h
anti2             0/1       Pending   0          1h
newpod-1-deploy   0/1       Pending   0          6s
runonssd          1/1       Running   0          15m

$ oc describe pods newpod-1-deploy
Name:               newpod-1-deploy
Namespace:          love
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             openshift.io/deployer-pod-for.name=newpod-1
Annotations:        openshift.io/deployment-config.name=newpod
                    openshift.io/deployment.name=newpod-1
                    openshift.io/scc=restricted
Status:             Pending
IP:
Containers:
  deployment:
    Image:      openshift/origin-deployer:v3.11
    Port:       <none>
    Host Port:  <none>
    Environment:
      OPENSHIFT_DEPLOYMENT_NAME:       newpod-1
      OPENSHIFT_DEPLOYMENT_NAMESPACE:  love
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from deployer-token-cdhw9 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  deployer-token-cdhw9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  deployer-token-cdhw9
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  4s (x5 over 31s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

$ oc edit pod newpod-1-deploy

$ oc whoami

system:admin

[root@okd ex280]# oc adm taint nodes localhost keyl=valuel:NoSchedule

node/localhost tainted

$ oc run newpod --image=bitnami/nginx

deploymentconfig.apps.openshift.io/newpod created

$ oc get pods

NAME READY STATUS RESTARTS AGE

anti1 1/1 Running 0 1h

anti2 0/1 Pending 0 1h

newpod-1-deploy 0/1 Pending 0 6s

runonssd 1/1 Running 0 15m

$ oc describe pods newpod-1-deploy

Name: newpod-1-deploy

Namespace: love

Priority: 0

PriorityClassName: <none>

Node: <none>

Labels: openshift.io/deployer-pod-for.name=newpod-1

Annotations: openshift.io/deployment-config.name=newpod

openshift.io/deployment.name=newpod-1

openshift.io/scc=restricted

Status: Pending

IP:

Containers:

deployment:

Image: openshift/origin-deployer:v3.11

Port: <none>

Host Port: <none>

Environment:

OPENSHIFT_DEPLOYMENT_NAME: newpod-1

OPENSHIFT_DEPLOYMENT_NAMESPACE: love

Mounts:

/var/run/secrets/kubernetes.io/serviceaccount from deployer-token-cdhw9 (ro)

Conditions:

Type Status

PodScheduled False

Volumes:

deployer-token-cdhw9:

Type: Secret (a volume populated by a Secret)

SecretName: deployer-token-cdhw9

Optional: false

QoS Class: BestEffort

Node-Selectors: <none>

Tolerations: <none>

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Warning FailedScheduling 4s (x5 over 31s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

$ oc edit pod newpod-1-deploy

Editing:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/deployment-config.name: newpod
    openshift.io/deployment.name: newpod-1
    openshift.io/scc: restricted
  creationTimestamp: 2023-07-29T11:01:32Z
  labels:
    openshift.io/deployer-pod-for.name: newpod-1
  name: newpod-1-deploy
  namespace: love
  ownerReferences:
  - apiVersion: v1
    kind: ReplicationController
    name: newpod-1
    uid: 471096bf-2dff-11ee-8f96-8e5760356a66
  resourceVersion: "2333327"
  selfLink: /api/v1/namespaces/love/pods/newpod-1-deploy
  uid: 471355a8-2dff-11ee-8f96-8e5760356a66
spec:
  activeDeadlineSeconds: 21600
  containers:
  - env:
    - name: OPENSHIFT_DEPLOYMENT_NAME
      value: newpod-1
    - name: OPENSHIFT_DEPLOYMENT_NAMESPACE
      value: love
    image: openshift/origin-deployer:v3.11
    imagePullPolicy: IfNotPresent
    name: deployment
    resources: {}
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000410000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: deployer-token-cdhw9
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: deployer-dockercfg-dpdps
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000410000
    seLinuxOptions:
      level: s0:c20,c15
  serviceAccount: deployer
  serviceAccountName: deployer
  terminationGracePeriodSeconds: 10
  volumes:
  - name: deployer-token-cdhw9
    secret:
      defaultMode: 420
      secretName: deployer-token-cdhw9
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2023-07-29T11:01:32Z
    message: '0/1 nodes are available: 1 node(s) had taints that the pod didn''t tolerate.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

apiVersion: v1

kind: Pod

metadata:

annotations:

openshift.io/deployment-config.name: newpod

openshift.io/deployment.name: newpod-1

openshift.io/scc: restricted

creationTimestamp: 2023-07-29T11:01:32Z

labels:

openshift.io/deployer-pod-for.name: newpod-1

namespace: love

ownerReferences:

- apiVersion: v1

kind: ReplicationController

uid: 471096bf-2dff-11ee-8f96-8e5760356a66

resourceVersion: "2333327"

selfLink: /api/v1/namespaces/love/pods/newpod-1-deploy

uid: 471355a8-2dff-11ee-8f96-8e5760356a66

spec:

activeDeadlineSeconds: 21600

containers:

- env:

- name: OPENSHIFT_DEPLOYMENT_NAME

value: newpod-1

- name: OPENSHIFT_DEPLOYMENT_NAMESPACE

value: love

image: openshift/origin-deployer:v3.11

imagePullPolicy: IfNotPresent

resources: {}

securityContext:

capabilities:

drop:

- KILL

- MKNOD

- SETGID

- SETUID

runAsUser: 1000410000

terminationMessagePath: /dev/termination-log

terminationMessagePolicy: File

volumeMounts:

- mountPath: /var/run/secrets/kubernetes.io/serviceaccount

readOnly: true

dnsPolicy: ClusterFirst

imagePullSecrets:

- name: deployer-dockercfg-dpdps

priority: 0

restartPolicy: Never

schedulerName: default-scheduler

securityContext:

fsGroup: 1000410000

seLinuxOptions:

level: s0:c20,c15

serviceAccount: deployer

serviceAccountName: deployer

terminationGracePeriodSeconds: 10

volumes:

- name: deployer-token-cdhw9

secret:

defaultMode: 420

secretName: deployer-token-cdhw9

status:

conditions:

- lastProbeTime: null

lastTransitionTime: 2023-07-29T11:01:32Z

message: '0/1 nodes are available: 1 node(s) had taints that the pod didn''t tolerate.'

reason: Unschedulable

status: "False"

type: PodScheduled

phase: Pending

qosClass: BestEffort

add

  tolerations:
  - key: key1
    value: value1
    operator: Equal
    effect: NoExecute

tolerations:

- key: key1

value: value1

operator: Equal

effect: NoExecute

After editing:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/deployment-config.name: newpod
    openshift.io/deployment.name: newpod-1
    openshift.io/scc: restricted
  creationTimestamp: 2023-07-29T11:01:32Z
  labels:
    openshift.io/deployer-pod-for.name: newpod-1
  name: newpod-1-deploy
  namespace: love
  ownerReferences:
  - apiVersion: v1
    kind: ReplicationController
    name: newpod-1
    uid: 471096bf-2dff-11ee-8f96-8e5760356a66
  resourceVersion: "2333327"
  selfLink: /api/v1/namespaces/love/pods/newpod-1-deploy
  uid: 471355a8-2dff-11ee-8f96-8e5760356a66
spec:
  activeDeadlineSeconds: 21600
  containers:
  - env:
    - name: OPENSHIFT_DEPLOYMENT_NAME
      value: newpod-1
    - name: OPENSHIFT_DEPLOYMENT_NAMESPACE
      value: love
    image: openshift/origin-deployer:v3.11
    imagePullPolicy: IfNotPresent
    name: deployment
    resources: {}
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000410000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: deployer-token-cdhw9
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: deployer-dockercfg-dpdps
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000410000
    seLinuxOptions:
      level: s0:c20,c15
  serviceAccount: deployer
  serviceAccountName: deployer
  terminationGracePeriodSeconds: 10


 tolerations:
  - key: key1
    value: value1
    operator: Equal
    effect: NoExecute
  
  volumes:
  - name: deployer-token-cdhw9
    secret:
      defaultMode: 420
      secretName: deployer-token-cdhw9
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2023-07-29T11:01:32Z
    message: '0/1 nodes are available: 1 node(s) had taints that the pod didn''t tolerate.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

apiVersion: v1

kind: Pod

metadata:

annotations:

openshift.io/deployment-config.name: newpod

openshift.io/deployment.name: newpod-1

openshift.io/scc: restricted

creationTimestamp: 2023-07-29T11:01:32Z

labels:

openshift.io/deployer-pod-for.name: newpod-1

namespace: love

ownerReferences:

- apiVersion: v1

kind: ReplicationController

uid: 471096bf-2dff-11ee-8f96-8e5760356a66

resourceVersion: "2333327"

selfLink: /api/v1/namespaces/love/pods/newpod-1-deploy

uid: 471355a8-2dff-11ee-8f96-8e5760356a66

spec:

activeDeadlineSeconds: 21600

containers:

- env:

- name: OPENSHIFT_DEPLOYMENT_NAME

value: newpod-1

- name: OPENSHIFT_DEPLOYMENT_NAMESPACE

value: love

image: openshift/origin-deployer:v3.11

imagePullPolicy: IfNotPresent

resources: {}

securityContext:

capabilities:

drop:

- KILL

- MKNOD

- SETGID

- SETUID

runAsUser: 1000410000

terminationMessagePath: /dev/termination-log

terminationMessagePolicy: File

volumeMounts:

- mountPath: /var/run/secrets/kubernetes.io/serviceaccount

readOnly: true

dnsPolicy: ClusterFirst

imagePullSecrets:

- name: deployer-dockercfg-dpdps

priority: 0

restartPolicy: Never

schedulerName: default-scheduler

securityContext:

fsGroup: 1000410000

seLinuxOptions:

level: s0:c20,c15

serviceAccount: deployer

serviceAccountName: deployer

terminationGracePeriodSeconds: 10

tolerations:

- key: key1

value: value1

operator: Equal

effect: NoExecute

volumes:

- name: deployer-token-cdhw9

secret:

defaultMode: 420

secretName: deployer-token-cdhw9

status:

conditions:

- lastProbeTime: null

lastTransitionTime: 2023-07-29T11:01:32Z

message: '0/1 nodes are available: 1 node(s) had taints that the pod didn''t tolerate.'

reason: Unschedulable

status: "False"

type: PodScheduled

phase: Pending

qosClass: BestEffort

Now:

$ oc get pods
NAME              READY     STATUS    RESTARTS   AGE
anti1             1/1       Running   0          1h
anti2             0/1       Pending   0          1h
newpod-1-deploy   0/1       Pending   0          21m
runonssd          1/1       Running   0          37m

$ oc adm taint nodes localhost keyl-
node/localhost untainted

$ oc get pods
NAME             READY     STATUS    RESTARTS   AGE
anti1            1/1       Running   0          1h
anti2            0/1       Pending   0          1h
newpod-1-qgmpj   1/1       Running   0          12s
runonssd         1/1       Running   0          38m

$ oc get pods

NAME READY STATUS RESTARTS AGE

anti1 1/1 Running 0 1h

anti2 0/1 Pending 0 1h

newpod-1-deploy 0/1 Pending 0 21m

runonssd 1/1 Running 0 37m

$ oc adm taint nodes localhost keyl-

node/localhost untainted

$ oc get pods

NAME READY STATUS RESTARTS AGE

anti1 1/1 Running 0 1h

anti2 0/1 Pending 0 1h

newpod-1-qgmpj 1/1 Running 0 12s

runonssd 1/1 Running 0 38m

Let’s remove the taint and the pod which before was pending now started tu run:

$ oc adm taint nodes localhost keyl-
node/localhost untainted

$ oc get pods
NAME             READY     STATUS    RESTARTS   AGE
anti1            1/1       Running   0          1h
anti2            0/1       Pending   0          1h
newpod-1-qgmpj   1/1       Running   0          12s
runonssd         1/1       Running   0          38m

$ oc adm taint nodes localhost keyl-

node/localhost untainted

$ oc get pods

NAME READY STATUS RESTARTS AGE

anti1 1/1 Running 0 1h

anti2 0/1 Pending 0 1h

newpod-1-qgmpj 1/1 Running 0 12s

runonssd 1/1 Running 0 38m