Kubernetes is one of the most popular projects around container orchestration but it’s quite interesting that Kubernetes itself has no code to run or manage Linux/windows containers. So, what is running the containers within your Kubernetes pods?
Yes… Kubernetes doesn’t run your containers
It’s just an orchestration platform sitting above container runtimes. No code to run a container and to manage the container’s lifecycle on its own, instead, dockershim was implemented (in kubelet ) for talking to Docker as container runtime. I will talk about dockershim in the later section of the blog.
Also, docker has grown and matured over the last few years and has gained a stack of components like runc (open container initiative), containerd (CNCF project). OCI (est. in June,2015) splits docker into two parts:
1) to handle docker cli & processing requests and
2) to handle container running functions i.e runC.
But Wait… what is Container Runtime?
Container runtime is responsible to execute containers and managing container images on nodes.
To understand the need for Container Runtime in Kubernetes, let’s start with a few basic concepts:
- The kubelet is a daemon that runs on every Kubernetes node. It is responsible for registering worker nodes with the API server and working with the podSpec
- The kubelet acts as a client when connecting to the container runtime via gRPC. The Kubernetes Container Runtime Interface (CRI) defines the main gRPC protocol for the communication between the cluster components, kubelet, and container runtime.
- Kubelet is also a controller where it watches for pod changes and utilizes the node’s container runtime to pull images, run containers, etc.
- It also exposes an HTTP endpoint to stream logs and provides exec sessions for clients.
- Uses the CSI (container storage interface) gRPC to configure block volumes.
- It uses the CNI plugin configured in the cluster to allocate the pod IP address and set up any necessary network routes and firewall rules for the pod. (In Kubernetes 1.24, the management of the CNI is no longer in scope for kubelet)
gRPC (open-source RPC system developed by Google in 2015) is faster than REST because it uses protocol buffers. Protobuf serializes and de-serializes data into binary thus reducing the size of the messages.
The node controller is a kubernetes control plane component that manages various aspects of nodes.
A pod is the smallest unit of reference within kubernetes. Each pod runs one or more containers, which together form a single functional unit.
The kubelet reads pod specs from the API server, usually defined in YAML configuration files. The pod specs say which container images the pod should run. It provides no details as to how containers should run — for this, kubernetes needs a container runtime.
Other than PodSepcs from the API server, Kubelet can accept podSpec from a file, HTTP endpoint, and HTTP server. A good example of “podSpec from a file” is 𝗞𝘂𝗯𝗲𝗿𝗻𝗲𝘁𝗲𝘀 𝘀𝘁𝗮𝘁𝗶𝗰 𝗽𝗼𝗱𝘀. Static pods are pods controlled by Kubelet on its nodes, not the API servers.
A Kubernetes node must have a container runtime installed. When the kubelet wants to process pod specs, it needs a container runtime to create the actual containers. The runtime is then responsible for managing the container lifecycle and communicating with the operating system kernel.
The network namespace creation is done by the container runtime only . Just before the pod is deployed and container created, it’s the runtime responsibility to create the network namespace. Instead of running ip netns command and creating the network namespace manually, the container runtime does this automatically with the help of Pod infrastructure container i.e. pause container. The command line argument to use is
--pod-infra-container-image.With pause container, app container can die & come back again (restart), and all of the network setup will still be there. Normally if the last process in a network namespace dies the namespace would be destroyed and creating a new app container would require creating all new network setup but this is not the case, pause container remains always there in sleep state inside network namespace keeping that network alive.
Is Docker the only container runtime?
The most widely known container runtime is Docker, but it is not the only one in this space. In fact, the container runtime space has been rapidly evolving. Early versions of kubernetes only worked with a specific container runtime i.e. Docker Engine.
Later, kubernetes added support for working with other container runtimes using Container Runtime Interface (est. Dec 2016). It enables interoperability between orchestrators (like Kubernetes) and many different container runtimes like containerd, CRI-O, cri-containerd, Mirantis container runtime etc.
Docker Engine doesn’t implement that interface (CRI), so the Kubernetes project created special code to help with the transition, and made that dockershim code a part of Kubernetes itself.
Dockershim is not a real CRI implementation but it is a built-in implementation in kubelet code base dedicated for docker as container runtime only. The kubernetes v1.24 release actually removed the dockershim from kubernetes.
Amazon EKS will be ending support for dockershim starting with the Kubernetes version 1.24 launch. Amazon EKS AMIs that are officially published will have containerd as the only runtime starting with version 1.24.
That’s How CRI Is born
Since Kubernetes 1.5 a new API was introduced, the Container Runtime Interface (CRI), allows any container runtime to plug into the kubelet configuration which enables kubelet to use a wide variety of container runtimes, without the need to recompile and provide container runtime services for Kubernetes.
Containerd: A core runtime
Kubernetes could use any container runtime that implements CRI to manage pods, containers and container images. Docker is the most common container runtime used in production Kubernetes environments, but containerd ( initiated by Docker Inc. & donated to CNCF in March of 2017) may prove to be a better option. For more details, you could refer official blog
Standards allow ecosystems to grow and thrive as we have seen in the case of OCI and CRI. As this ecosystem grows standards will need to evolve to meet the ecosystem’s needs. As everyone is upgrading to the latest version of Kubernetes so I thought CRI is a very important aspect to know before talking about what is being removed and added in newer versions of kubernetes. Hence, this blog talks about the what, why, how, and evolution of CRI in kubernetes since its existence. Thanks for reading. I’d really appreciate your suggestions and feedback.
Blog Pundits: Shweta Tyagi and Sandeep Rawat
OpsTree is an End-to-End DevOps Solution Provider.
Connect with Us