Kubernetes cluster architecture

The Kubernetes architecture consists of a lot of different components working with each other, talking to each other in many different ways. So they all need to know where the other components are. There are different modes of authentication, authorization, encryption and security.

Kubernetes Ecosystem:

  • Cloud Native Computing Foundation (CNCF) hosts many projects related to
    cloud native computing
  • Kubernetes is among the most important projects, but many other projects are offered as well, implementing a wide range of functionality
    • Networking
    • Dashboard
    • Storage
    • Observability
    • Ingress
  • To get a completely working Kubernetes solution, products from the ecosystem need to be installed also
  • This can be done manually, or by using a distribution

Running Kubernetes Anywhere

  • Kubernetes is a platform for cloud native computing, and as such is commonly used in cloud
  • All major cloud providers have their own integrated Kubernetes distribution
  • Kubernetes can also be installed on premise, within the secure boundaries of your own datacenter
  • And also, there are all-in-one solutions which are perfect for learning Kubernetes

Understanding Kubernetes Distributions

  • Kubernetes distributions add products from the ecosystem to vanilla kubernetes and provide support
  • Normally, distributions run one or two Kubernetes versions behind
  • Some distributions are opinionated: they pick one product for a specific solution and support only that
  • Other distributions are less opinionated and integrate multiple products to offer specific solutions

Common Kubernetes Distributions

  • In Cloud
    • Amazon Elastic Kubernetes Services (EKS)
    • Azure Kubernetes Services (AKS)
    • Google Kubernetes Engine (GKE)
  • On Premise
    • OpenShift
    • Google Antos
    • Rancher
    • Canonical Charmed Kubernetes
  • Minimal (learning) Solutions
    • Minikube
    • K3s

Kubernetes Node Roles

  • The control plane runs Kubernetes core services, kubernetes agents, and no user workloads
  • The worker plane runs user workloads and Kubernetes agents
  • All nodes are configured with a container runtime, which is required for running containerized workloads
  • The kubelet systemd service is responsible for running orchestrated containers as Pods on any node

Node Requirements

  • To install a Kubernetes cluster using kubeadm, you’ll need at least two nodes that meet the following requirements:
    • Running a recent version of Ubuntu or CentOS
    • 2GiB RAM or more
    • 2 CPUs or more on the control-plane node
    • Network connectivity between the nodes
  • Before setting up the cluster with kubeadm, install the following:
    • A container runtime
    • The Kubernetes tools

Installing a Container Runtime

  • The container runtime is the component that allows you to run containers
  • Kubernetes supports different container runtimes
    • containerd
    • CR1-0
    • Docker Engine
    • Mirantis Container Runtime

Kubernetes Networking

Different types of network communication are used in Kubernetes

    • Node communication: handled by the physical network
    • External-to-Service communication: handled by Kubernetes Service resources
    • Pod-to-Service communication: handled by Kubernetes Services
    • Pod-to-Pod communication: handled by the network plugin

Network Add-on

  • To create the software defined Pod network, a network add-on is needed
  • Different network add-ons are provided by the Kubernetes ecosystem
  • Vanilla Kubernetes doesn’t come with a default add-on, as it doesn’t want to favor a specific solution
  • Kubernetes provides the Container Network Interface (CNI), a generic interface that allows different plugins to be used
  • Availability of specific features depends on the network plugin that is used
    • Networkpolicy
    • IPv6
    • Role Based Access Control (RBAC)

Common Network Add-ons

  • Calico: probably the most common network plugin with support for all
    relevant features
  • Flannel: a generic network add-on that was used a lot in the past, but doesn’t support NetworkPolicy
  • Multus: a plugin that can work with multiple network plugins. Current default in OpenShift
  • Weave: a common network add-on that does support common features

 

ETCD

ETCD is a distributed reliable key-value store that is Simple, Secure & Fast. The etcd data store stores information regarding the cluster such as:

• Nodes
• PODs
• Configs
• Secrets
• Accounts
• Roles
• Bindings
• Others

Every information you see when you run the kube control get command is from the etcd server. Every change you make to your cluster such as adding additional nodes, deploying pods or replica sets are updated in the etcd server.

Only once it is updated in the etcd server is the change considered to be complete. Depending on how you set up your cluster, etcd is deployed differently.

There  are two types of Kubernetes deployments,

  • deploying from scratch
  • deploying using the Qadium tool.

Setup – Manual

The advertised client URL. is the address on which etcd listens. It happens to be on the IP of the server and on port 2379, which is the default port on which etcd listens. This is the URL that should be configured on the kube API server when it tries to reach the etcd server.

etcd.service

 

Setup – kubeadm

If you set up your cluster using kubeadm, then kubeadm deploys the etcd server for you as a pod in the kube system namespace. You can explore the etcd database using the etcd control utility within this pod.

To list all keys stored by Kubernetes, run the etcd control get command like this.

Kubernetes stores data in the specific directory structure. The root directory is a registry, and under that, you have the various Kubernetes constructs, such as minions or nodes, pods, replica sets, deployments, etc.

Explore ETCD

In a high availability environment, you will have multiple master nodes in your cluster. Then you will have multiple etcd instances spread across the master nodes. In that case, make sure that the etcd instances know about each other by setting the right parameter in the etcd service configuration.

ETCD in HA Environment

The initial cluster option is where you must specify the different instances of the etcd service.

 

ETCD – Commands

ETCDCTL is the CLI tool used to interact with ETCD. ETCDCTL can interact with ETCD Server using 2 API versions – Version 2 and Version 3.  By default its set to use Version 2. Each version has different sets of commands.

For example ETCDCTL version 2 supports the following commands:

Whereas the commands are different in version 3


To set the right version of API set the environment variable ETCDCTL_API command

When API version is not set, it is assumed to be set to version 2. And version 3 commands listed above don’t work. When API version is set to version 3, version 2 commands listed above don’t work.

Apart from that, you must also specify path to certificate files so that ETCDCTL can authenticate to the ETCD API Server. The certificate files are available in the etcd-master at the following path.


So for the commands which was showed earlier to work you must specify the ETCDCTL API version and path to certificate files. Below is the final form:


 

kube-apiserver

The kube-apiserver is the primary management component in Kubernetes. When you run a kubectl command, the kubectl utility is in fact reaching to the kube-apiserver.

The kube-apiserver first authenticates the request and validates it. It then retrieves the data from the etcd cluster and responds back with the requested information. The kube-apiserver is at the center of all the different tasks that needs to be performed to make a change in the cluster. To summarize, the kube-apiserver is responsible for authenticating and validating requests, retrieving and updating data in the etcd data store. In fact, kube-apiserver is the only component that interacts directly with the etcd data store.

The other components, such as the scheduler, kube-controller-manager and kubelet uses the API server to perform updates in the cluster in their respective areas.

 

Installing kube-api server

If you’re setting up the hardware, then the kube-apiserver is available as a binary in the Kubernetes release page. Download it and configure it to run as a service on your Kubernetes master node.

The kube-apiserver is run with a lot of parameters, as you can see here.

kube-apiserver.service

How to view the kube-apiserver options in an existing cluster depends on how you set up your cluster.

View api-server – kubeadm

 

View api-server options –  cluster set it up with a kubeadm tool

 

View api-server options – non kubeadm setup,

 

You can also see the running process and the effective options by listing the process on the master node and searching for kube-apiserver.

 

Kube Controller Manager

In the Kubernetes terms a controller is a process that continuously monitors the state of various components within the system and works towards bringing the whole system to the desired functioning state.

For example, the node controller is responsible for monitoring the status of the nodes and taking necessary actions to keep the applications running. It does that through the Kube API server. The node controller tests the status of the nodes every five seconds. That way the note controller can monitor the health of the notes.

The replication controller. It is responsible for monitoring the status of replica sets and ensuring that the desired number of PODs are available at all times within the set. If a POD dies, it creates another one.

There are many more such controllers available within Kubernetes. All of them are packaged into a single process knownas the Kubernetes Controller Manager.

Installing kube-controller-manager

How to view the Kube controller managers server options depends on how you set up your cluster.

View kube-controller-manager – set it up with the Kube admin tool

The options within the POD definition file located at etc kubernetes manifest folder.

 

View controller-manager options – non Kube admin setup,

To see the running process and the effective options you must list the process on the master node and searching for Kube Controller Manager.

 

Kube Scheduler

The Kubernetes scheduler is responsible for scheduling pods on nodes.

Installing kube-scheduler

Download the kube-scheduler binary from the Kubernetes release page, extract it, and run it as a service. When you run it as a service, you specify the scheduler configuration file.

 

View kube-scheduler options kubeadm

If you set it up with the kubeadm tool, you can see the options within the pod definition file located at /etc/kubernetes/manifest/folder.

You can also see the running process and the effective options by listing the process on the master node and searching for kube-scheduler.

 

Kubelet

The kubelet in the Kubernetes worker node registers the node with a Kubernetes cluster. When it receives instructions to load a container or a pod on the node, it requests the container runtime engine, which may be Docker, to pull the required image and run an instance. The kubelet then continues to monitor the state of the pod and containers in it and reports to the kube API server on a timely basis.

Installing kubelet

If you use the kubeadm tool to deploy your cluster, it does not automatically deploy the kubelet. Now that’s the difference from other components. You must always manually install the kubelet on your worker nodes. Download the installer, extract it, and run it as a service.

 

View kubelet options

You can view the running kubelet process and the effective options by listing the process on the worker node and searching for kubelet.

 

 kube-proxy

Kube-proxy is a process that runs on each node in the Kubernetes cluster.n Its job is to look for new services,and every time a new service is created, it creates the appropriate rules on each nodeto forward traffic to those services to the backend pods.One way it does this is using iptables rules.

Within a Kubernetes cluster, every pod can reach every other pod. This is accomplished by deploying a pod networking solution to the cluster. A pod network is an internal virtual network that spans across all the nodes in the cluster to which all the pods connect to. Through this network, they’re able to communicate with each other.

Installing kube-proxy

View kube-proxy – kubeadm

The kubeadm tool deploys kube-proxy as pods on each node.

In fact, it is deployed as a DaemonSet, so a single pod is always deployed on each node in the cluster.