While deploying the deployment manifest, we found that some of the critical pods are not getting scheduled whereas others are getting scheduled easily. Now, I wanted to make sure that the critical pod gets scheduled first over other pods. I started exploring pod scheduling and then came across one of the native solutions for Pod Scheduling using Pod Priority & Priority Class. So in this blog, we’ll talk about Priority Class & Pod Priority and how we can use them for pod scheduling.
It determines the importance of a pod over another pod. It is most helpful when we need to schedule the critical pods, which are unable to schedule due to resource capacity issues.
It is a non-namespace object. It is used to define the priority. Priority Class objects can have any 32-bit integer value smaller than or equal to 1 billion. The higher the value, the higher will be the priority.
It allows the higher-priority pods to evict the lower-priority pods so that higher-priority pods can be scheduled, which is by default enabled when we create PriorityClass.
In this blog, we will create an active-active infrastructure on Microsoft Azure using Terraform and Jenkins.
Prime Reasons to have an active-active set-up of your infrastructure
Disaster recovery (DR) is an organization’s method of regaining access and functionality to its IT infrastructure after events like a natural disaster, cyber attack, or even business disruptions just like during the COVID-19 pandemic.
Ensure business resilience No matter what happens, a good DR plan can ensure that the business can return to full operations rapidly, without losing data or transactions.
Maintain competitiveness Loyalty is rare and when a business goes offline, customers turn to competitors to get the goods or services they require. A DR plan prevents this.
Avoid data loss The longer a business’s systems are down, the greater the risk that data will be lost. A robust DR plan minimizes this risk.
Maintain reputation A business that has trouble resuming operations after an outage can suffer brand damage. For that reason, a solid DR plan is critical.
Before deep dive into the SRE world, let’s talk about, where SRE is derived from. The concept of SRE got originated in 2003 by Ben Treynor Sloss. In 2003, when the cloud wasn’t a thing, Google was one of the most prominent web companies with a massive and distributed infrastructure. They had several challenges to face simultaneously; keep the trust and reputation of their services, provide a smooth user experience involving minimum downtime and latency, manage dozens of sprawling data centers, etc. They needed to rely heavily on automation and, thereby, formulated strategies that led them to implement large-scale automation. Small Companies at that time could bear the loss of a few hours of downtime but giants like Google could not afford it as they were a frontier of best user experience. Therefore, come to think of it, building a team that can help ensure the application’s availability and reliability was an obvious outcome.
Today’s world is entirely internet-driven, be it in any field, we can get any product of our choice with one click.
Talking about e-commerce more in DevOps terms, the entire application/website is based on microservice architecture i.e. distributing a bulk application into smaller services to increase scalability, manageability & more process driven.
Hence, to maintain smaller services one of the important aspects is to enable their Monitoring.
One such commonly known stack is, EFK stack i.e. (Elasticsearch, Fluentd, Kibana) along with Kafka.
Kafka is basically an open-source event streaming platform and is currently used by many companies.
Question: Why use Kafka within EFK monitoring?
Answer: Well this is the first question that strikes many minds hence, in this blog we’ll focus on why to use Kafka, what are its benefits and how to integrate it with the EFK stack.
Today, most organizations are moving to Managed Services like EKS (Elastic Kubernetes Services), and AKS (Azure Kubernetes Services), for easier handling of the Kubernetes Cluster. With Managed Kubernetes we do not have to take care of our Master Nodes, cloud providers will be responsible for all Master Nodes and Worker Nodes, freeing up our time. We just need to deploy our Microservices over the Worker nodes. You can pay extra to achieve an uptime of 99.95%. Node repair ensures that a cluster remains healthy and reduces the chances of possible downtime. This is good in many cases but it makes it an expensive ordeal as AKS costs $0.10 per cluster per hour. You have to install upgrades for the VPC CNI yourself and also, install Calico CNI. There is no IDE extension for developing EKS code. it also creates a dependency on the particular Cloud Provider.
To skip the dependency on any Cloud Provider we have to create a VanillaKubernetes Cluster. This means we have to take care of all the components – all the Master and Worker Nodes of the Cluster by ourselves.
Here we got a scenario in which one of our client’s requirements was to set up a Kubernetes cluster over On-premises Servers, under the condition of no Internet connectivity. So I choose to perform the setup of the Kubernetes Cluster via Kubespray.
Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks. Kubespray provides a highly available cluster, composable (choice of the network plugin for instance), supports most popular Linux distributions, and continuous integration tests.