Protected EFK Stack Setup for Kubernetes


In this blog, we will see how we can deploy the Elasticsearch, Fluent-bit, and Kibana (EFK) stack on Kubernetes. EFK stack’s prime objective is to reliably and securely retrieve data from the K8s cluster in any format, as well as to facilitate anytime searching, analyzing, and visualizing of the data.

What is EFK Stack?

EFK stands for Elasticsearch, Fluent bit, and Kibana.

Elasticsearch is a scalable and distributed search engine that is commonly used to store large amounts of log data. It is a NoSQL database. Its primary function is to store and retrieve logs from fluent bit.

Fluent Bit is a logging and metrics processor and forwarder that is extremely fast, lightweight, and highly scalable. Because of its performance-oriented design, it is simple to collect events from various sources and ship them to various destinations without complexity.

Kibana is a graphical user interface (GUI) tool for data visualization, querying, and dashboards. It is a query engine that lets you explore your log data through a web interface, create visualizations for event logs, and filter data to detect problems. Kibana is being used to query elasticsearch indexed data.

Why do we need EFK Stack?

  • Using the EFK stack in your Kubernetes cluster can make it much easier to collect, store, and analyze log data from all the pods and nodes in your cluster, making it more manageable and more accessible for different users.
  • The kubectl logs command is useful for looking at logs from individual pods, but it can quickly become unwieldy when you have a large number of pods running in your cluster.
  • With the EFK stack, you can collect logs from all the nodes and pods in your cluster and store them in a central location. It allows you to quickly troubleshoot issues and identify patterns in your log data.
  • It also enables people who are not familiar with using the command line to check logs and keep track of the Kubernetes cluster and the applications that are deployed on it.
  • It also allows you to easily create alerts, dashboards, and create monitoring and reporting capabilities that can give you an overview of your system’s health and performance, and It will notify you in real-time if something goes wrong.
In this tutorial, we will be deploying EFK components as follows:
  1. Elasticsearch is deployed as statefulset as it stores the log data.
  2. Kibana is deployed as deployment and connects to elasticsearch service endpoint.
  3. Fluent-bit is deployed as a daemonset to gather the container logs from every node. It connects to the Elasticsearch service endpoint to forward the logs.
So let’s get started with EFK stack deployment.

Creating a Namespace

It’s a good practice to create a separate namespace for every functional unit in Kubernetes. By creating a namespace for each functional unit, you can easily see which resources belong to which unit and manage them accordingly.

let’s create a namespace

kind: Namespace
apiVersion: v1
metadata:
name: kube-logging
view raw namespace.yaml hosted with ❤ by GitHub

Let’s apply the manifest file created above, run the following command:

$ Kubectl apply -f namespace.yaml

Creating a Secret

Secrets in Kubernetes (K8s) are native sources for storing and managing sensitive data such as passwords, cloud access keys, or authentication tokens. You must distribute this information across your Kubernetes clusters while also protecting it.

Let’s create a secret for an elasticsearch password to make Kibana logging protected.

apiVersion: v1
kind: Secret
metadata:
name: kibana-password
namespace: kube-logging
type: Opaque
data:
password: cGFzc3dvcmQ=
view raw secrets.yaml hosted with ❤ by GitHub

Let’s apply the manifest file created above, run the following command:

$ Kubectl apply -f secrets.yaml

Deploy Elasticsearch as Statefulset

Deploying Elasticsearch as a StatefulSets provides stable unique network identities and stable storage. Elasticsearch stores a large amount of data and it’s important that data stored in Elasticsearch should be persisted and available even if the pod is deleted or recreated.

Creating the Headless Service

Now Let’s set up elasticsearch, a Kubernetes headless service that will define a DNS domain for pods. A headless service lacks load balancing and has no static IP address.

Let’s create a Headless Service for an elasticsearch.

kind: Service
apiVersion: v1
metadata:
name: elasticsearch
namespace: kube-logging
labels:
app: elasticsearch
spec:
selector:
app: elasticsearch
clusterIP: None
ports:
port: 9200
name: rest
port: 9300
name: inter-node
view raw es-svc.yaml hosted with ❤ by GitHub

Let’s apply the elasticsearch service file created above, run the following command:

$ Kubectl apply -f es-svs.yaml
Creating the Elasticsearch StatefulSet

Deploying Elasticsearch as a StatefulSets pods are created and deleted in a specific order, ensuring that your data is not lost. This is especially useful for Elasticsearch, as it helps ensure that data is not lost during deployments and scaling events.

When you create a StatefulSet with a Persistent Volume Claim (PVC) template, the default storage class will be used if no custom storage class is specified. To use a custom storage class for the PVCs created by a StatefulSet, you can specify the storage class in the volumeClaimTemplates section of the StatefulSet definition.

Let’s create a StatefulSets for an elasticsearch.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: es-cluster
namespace: kube-logging
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
ports:
containerPort: 9200
name: rest
protocol: TCP
containerPort: 9300
name: inter-node
protocol: TCP
volumeMounts:
name: data
mountPath: /usr/share/elasticsearch/data
env:
name: cluster.name
value: k8s-logs
name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
name: discovery.seed_hosts
value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
name: cluster.initial_master_nodes
value: "es-cluster-0,es-cluster-1,es-cluster-2"
name: network.host
value: "0.0.0.0"
name: xpack.security.enabled
value: "true"
name: xpack.monitoring.collection.enabled
value: "true"
name: ES_JAVA_OPTS
value: "-Xms512m -Xmx512m"
name: ELASTIC_PASSWORD
valueFrom:
secretKeyRef:
name: kibana-password
key: password
initContainers:
name: fix-permissions
image: busybox
command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
securityContext:
privileged: true
volumeMounts:
name: data
mountPath: /usr/share/elasticsearch/data
name: increase-vm-max-map
image: busybox
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true
name: increase-fd-ulimit
image: busybox
command: ["sh", "-c", "ulimit -n 65536"]
securityContext:
privileged: true
volumeClaimTemplates:
metadata:
name: data
labels:
app: elasticsearch
spec:
accessModes: [ "ReadWriteOnce" ]
# storageClassName: do-block-storage
resources:
requests:
storage: 10Gi
view raw es-sts.yaml hosted with ❤ by GitHub

Let’s apply the elasticsearch statefulset manifest file created above, run the following command:

$ kubectl apply -f es-sts.yaml
Verify Elasticsearch Deployment

Let’s check all Elastisearch pods come into the running state.

$ kubectl get pods -n kube-logging

To check the health of the Elasticsearch cluster.

$ curl http://localhost:9200/_cluster/health/?pretty

The status of the Elasticsearch cluster will be shown in the output. If all of the steps were performed properly, the status should be ‘green’.

Let’s move on to Kibana now that we have an Elasticsearch cluster up and running.

Deploy Kibana as Deployment

Deploying Kibana as a Deployment allows you to easily scale the number of replicas up or down to handle changes in load. This is especially useful for handling large amounts of data and dealing with periods of high traffic and helps you to take advantage of Kubernetes features such as automatic self-healing and automatic scaling which can save resources and cost for running Kibana pods.

Creating the Kibana Service

Let’s make a NodePort service to access the Kibana UI via the node IP address. For demonstration purposes or testing, however, it’s not considered a best practice for actual production use. The Kubernetes ingress with a ClusterIP service is a more secure and way to expose the Kibana UI.

Let’s create a Service for Kibana.

apiVersion: v1
kind: Service
metadata:
name: kibana
namespace: kube-logging
labels:
app: kibana
spec:
ports:
port: 5601
selector:
app: kibana
view raw kibana-svc.yaml hosted with ❤ by GitHub

Let’s apply the kibana service file created above, run the following command:

Kubectl apply -f kibana-svc.yaml
Creating the Kibana Deployment

Kibana can be set up as a simple Kubernetes deployment. If you look at the Kibana deployment manifest file, you’ll notice that we have an env variable ELASTICSEARCH_URL defined to configure the Elasticsearch cluster endpoint. Kibana communicates to elasticsearch via the endpoint URL.

Let’s create a Deployment for Kibana.

apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana
namespace: kube-logging
labels:
app: kibana
spec:
replicas: 1
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
containers:
name: kibana
image: docker.elastic.co/kibana/kibana:7.2.0
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 700m
memory: 1Gi
env:
name: ELASTICSEARCH_URL
value: http://elasticsearch:9200
name: ELASTICSEARCH_USERNAME
value: elastic
name: ELASTICSEARCH_PASSWORD
valueFrom:
secretKeyRef:
name: kibana-password
key: password
ports:
containerPort: 5601
name: kibana
protocol: TCP
Verify Kibana Deployment

Let’s check all Kibana pods come into the running state.

kubectl get pods -n kube-logging

Once the pods for Elasticsearch, Fluent-bit, and Kibana have entered the running state, you can verify the deployment by accessing the Kibana UI.

To check the status using the UI access of the cluster, you can use the kubectl port-forward command to forward the Kibana pod’s 5601 port to your local machine.

$ kubectl port-forward <kibana-pod-name> 5601:5601

After that, use curl to send a request or the web browser to access the UI.

$ curl http://localhost:5601

Now that we have a Kibana pod running, let’s move on to fluent-bit.

Deploy Fluent-bit as Daemonset

Since Fluent-bit must stream logs from every node in the clusters, it is set up as a daemonset. A DaemonSet is a type of Kubernetes resource that ensures that a specified pod is running on each node in the cluster.

Creating the Fluent-bit Service Account

A Service Account is a Kubernetes resource that allows you to control access to the Kubernetes API for a set of pods, which determines what the pods are allowed to do. You can attach roles and role bindings to the service account, to give it specific permissions to access the Kubernetes API, this is done through Kubernetes Role and Rolebinding resources.

Let’s create a Service Account for Fluent-bit.

apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit-sa
namespace: kube-logging
labels:
app: fluent-bit-sa
Creating the Fluent-bit ClusterRole

ClusterRole to grant the get, list, and watch permissions to fluent-bit Service Account on the Kubernetes resources like the nodes, pods and namespaces objects.

Let’s create a ClusterRole for Fluent-bit.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit-role
labels:
app: fluent-bit-role
rules:
nonResourceURLs:
/metrics
verbs:
get
apiGroups: [""]
resources:
namespaces
pods
pods/logs
verbs: ["get", "list", "watch"]
Creating the Fluent-bit Role Binding

ClusterRoleBinding to bind this ClusterRole to the “fluent-bit-sa” Service Account, which will give that ServiceAccount the permissions defined in the ClusterRole.

Let’s create a Role Binding for Fluent-bit.

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluent-bit-rb
roleRef:
kind: ClusterRole
name: fluent-bit-role
apiGroup: rbac.authorization.k8s.io
subjects:
kind: ServiceAccount
name: fluent-bit-sa
namespace: kube-logging
Creating the Fluent-bit ConfigMap

This ConfigMap is used to configure a Fluent-bit pod, by specifying the ConfigMap field in the pod definition. This way when the pod starts it will use the configurations defined in the configmap. ConfigMap can be updates and changes, it will reflect in the pod without the need to recreate the pod itself.

Let’s create a ConfigMap for Fluent-bit.

apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: kube-logging
labels:
k8s-app: f-bit-pod
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-elasticsearch.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.*
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
output-elasticsearch.conf: |
[OUTPUT]
Name es
Match *
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
HTTP_User ${FLUENT_ELASTICSEARCH_USER}
HTTP_Passwd ${FLUENT_ELASTICSEARCH_PASSWORD}
Logstash_Format On
Replace_Dots On
Retry_Limit False
parsers.conf: |
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache2
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache_error
Format regex
Regex ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S

The configuration in this manifest file specifies several different sections, including:

  • [SERVICE]: in which we specify the flush time and log level of the service.
  • [INPUT]: in which we specify the input plugin that Fluent-bit should use to collect log data. In this case, the plugin is “tail”, which is used to collect logs from the files at the specified path, here is “/var/log/containers/*.log”
  • [FILTER]: in which we specify the filter plugin that Fluent-bit should use to process the log data. Here is “kubernetes” which is used to parse Kubernetes-specific metadata from the log lines.
  • [OUTPUT]: in which we specify the output plugin that Fluent-bit should use to send log data to Elasticsearch. It also includes the Elasticsearch endpoint host and port, along with other configurations.
  • [PARSER] in which we specify the format of the log data and how it should be parsed.
Creating the Fluent-bit Daemonset

The Fluent-bit Daemonset will automatically start collecting logs from all the nodes and send them to Elasticsearch.

Let’s create a Daemonset for Fluent-bit.

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: f-bit-pod
namespace: kube-logging
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: fluent-bit-logging
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "2020"
prometheus.io/path: /api/v1/metrics/prometheus
spec:
containers:
name: fluent-bit
image: fluent/fluent-bit:1.3.11
imagePullPolicy: Always
ports:
containerPort: 2020
env:
name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch"
name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
name: FLUENT_ELASTICSEARCH_USER
value: "elastic"
name: FLUENT_ELASTICSEARCH_PASSWORD
valueFrom:
secretKeyRef:
name: kibana-password
key: password
volumeMounts:
name: varlog
mountPath: /var/log
name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
name: fluent-bit-config
mountPath: /fluent-bit/etc/
terminationGracePeriodSeconds: 10
volumes:
name: varlog
hostPath:
path: /var/log
name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit-sa
tolerations:
key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
operator: "Exists"
effect: "NoExecute"
operator: "Exists"
effect: "NoSchedule"
Verify Fluent-bit Deployment

By creating a pod that generates logs continuously, you can ensure that Fluent-bit is correctly collecting and forwarding the logs to Elasticsearch.

You can create a pod that generates logs continuously by using an image that has a script or a program that generates logs. For example, you can use the busybox image and run the logger command in a loop inside the pod.

Let’s create a test pod for Fluent-bit.

apiVersion: v1
kind: Pod
metadata:
name: log-generator
spec:
containers:
name: log-generator
image: busybox
command: ['sh', '-c', 'i=0; while true; do echo "$i: Log message"; i=$((i+1)); sleep 1; done']
view raw test-pod.yaml hosted with ❤ by GitHub

By doing this, you are able to verify that Fluent-bit is working as expected, and it is collecting and forwarding the logs to Elasticsearch, where they can be analyzed and visualized in Kibana.

That’s it!

Hope you found this tutorial helpful, And that you will be able to set up the EFK stack for logging in Kubernetes with ease.

Blog Pundits: Bhupender Rawat and Sandeep Rawat

OpsTree is an End-to-End DevOps Solution Provider.

Connect with Us

Author: Priyanshi Chauhan

I am a DevOps Engineer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: