Pod scheduling on Openshift

Understanding the Scheduler

  • 3 types of rules are applied for Pod scheduling
    • Node labels
    • Affinity rules
    • Anti-affinity rules
  • The Pod Scheduler works through 3 steps:
    • Node filter: this is where node availability, but also selectors, taints, resource availability and more is evaluated
    • Node prioritizing: based on affinity rules, nodes are prioritized
    • Select the best nodes: the best scoring node is used, and if multiple nodes apply, round robin is used to select one of them
  • When used on cloud, scheduling by default happens within the boundaries of a region

 

Using Node Labels to Control Pod Placement

  • Nodes can be configured with a label
  • A label is an arbitrary key-value pair that is set with oc label
  • Pods can next be configured with a nodeSelector property on the Pod so that they’ll only run on the node that has the right label

 

Applying Labels to Nodes

  • A label is an arbitrary key-value pair that can be used as a selector for Pod placement
  • Use oc label node workerl.example.com env=test to set the label
  • Use oc label node workerl.example.com env=prod --overwrite to overwrite
  • Use oc label node workerl.example.com env- to remove the label
  • Use oc get ... --show-labels to show labels set on any type of resource

 

Applying Labels to Machine Sets

  • A machine set is a group of machines that is created when installing OpenShift using full stack automation
  • Machine sets can be labeled so that nodes generated from the machine set will automatically get a label
  • To see which nodes are in which machine set, use oc get machines -n openshift-machine-api -o wide
  • Use oc edit machineset … to set a label in the machine set spec.metadata
  • Notice that nodes that were already generated from the machine set will not be updated with the new label

 

Configuring NodeSelector on Pods

  • Infrastructure-related Pods in an OpenShift cluster are configured to run on a controller node
  • Use nodeSelector on the Deployment or DeploymentConfig to configure its Pods to run on a node that has a specific label
  • Use oc edit to apply nodeSelector to existing Deployments or DeploymentConfigs
  • If a Deployment is configured with a nodeSelector that doesn’t match any node label, the Pods will show as pending
  • Fix this by setting Deployment spec.template.spec.nodeSelector to the desired key-value pair

 

Configuring NodeSelector on Projects

  • nodeSelector can also be set on a project such that resources created in the deployment are automatically placed on the right nodes: oc adm new-project test --node-selector "env=test"
  • To configure a default nodeSelector on an existing project, add an annotation to its underlying namespace resource: oc annotate namespace test openshift.ianode-selector="test" --overwrite

 

Using NodeSelector

  • oc login -u developer -p password
  • oc create deployment simple --image=bitnami/nginx:latest
  • oc get all
  • oc scale --replicas 4 deployment/simple
  • oc get pods -o wide
  • oc login -u admin -p password
  • oc get nodes -L env
  • oc label node NODE_NAME env=dev

As normal user::

As admin:

Back to the developer user:

Edit the  deployment/simple  by adding after dnsPolicy:

And then:

Let’s describe the pod which is not runing due to bad nodeSelector=blah:

Now change  again nodeSelector in deployment/simple :

After the changes all pods are running again:

And now we must remove the label which previously was added::

 

Deployment or DeploymentConfig

  • Deployment is the Kubernetes resource, DeploymentConfig is the OpenShift resource
  • It doesn’t matter which one is used
  • DeploymentConfig is created when working with the console
  • Deployment is the standard, but when using oc new-app --as-deployment-config it will create a DeploymentConfig instead

 

Pod Affinity

  • [Anti-]affinity defines relations between Pods
  • podAffinity is a Pod property that tells the scheduler to locate a new Pod on the same node as other Pods
  • podAntiAffinity tells the scheduler not to locate a new Pod on the same node as other Pods
  • nodeAffinity tells a Pod (not) to schedule on nodes with specific labels
  • [Anti]-affinity is applied based on Pod labels
  • Required affinity rules must be met before a Pod can be scheduled on a node
  •  Preferred rules are not guaranteed

matchExpressions

  • In affinity rules, a matchExpression is used on the key-value specification that matches the label
  • In this matchExpression the operator can have the following values
    • NotIn
    • Exists
    • DoesNotExist
    • Lt
    • Gt

Node Affinity

  • Node affinity can be used to only run a Pod on a node that meets specific requirements
  • Node affinity, like Pod affinity, works with labels that are set on the node
  • Required rules must be met
  • Preferred rules should be met

 

Using nodeAffinity

  • oc login -u admin -p password
  • oc create -f nodeaffinity.yaml
  • oc describe pod runonssd
  • oc label node crc-[Tab] disktype=nvme
  • oc describe pod runonssd

 

 

Using nodeAffinity

  • oc login -u admin -p password
  • oc create -f node-affinity.yaml
  • oc describe pod runonssd
  • oc label node crc-[Tab] disktype=nvme
  • oc describe pod runonssd

 

 

Taints and Tolerations

  • A taint allows a node to refuse a Pod unless the Pod has a matching toleration
  • taints are applied to nodes through the node spec
  • tolerations are applied to a Pod through the Pod spec
  • Taints and tolerations consist of a key, a value, and an effect
  • The effect is one of the following:
    • NoSchedule: new Pods will not be scheduled
    • PreferNoSchedule: the scheduler tries to avoid scheduling new Pods
    • NoExecute: new Pods won’t be scheduled and existing Pods will be removed
    • All effects are only applied if no toleration exists on the Pods
  • Use tolerationSeconds to specify how long it takes before Pods are evicted when NoExecute is set

 

Managing Taints (Fails on CRC!)

  • oc login -u admin -p password
  • oc adm taint nodes crc-[Tab] keyl=valuel:NoSchedule
  • oc run newpod --image=bitnami/nginx
  • oc get pods
  • oc describe pods mypod
  • oc edit pod mypod

spec:

  tolerations:

  - key: key1

    value: value1

    operator: Equal

    effect: NoExecute

  • oc adm taint nodes crc-[Tab] keyl-

 

Editing:

add

After editing:

Now:

Let’s remove the taint and the pod which before was pending now started tu run: