Today’s world is entirely internet-driven, be it in any field, we can get any product of our choice with one click.
Talking about e-commerce more in DevOps terms, the entire application/website is based on microservice architecture i.e. distributing a bulk application into smaller services to increase scalability, manageability & more process driven.
Hence, to maintain smaller services one of the important aspects is to enable their Monitoring.
One such commonly known stack is, EFK stack i.e. (Elasticsearch, Fluentd, Kibana) along with Kafka.
Kafka is basically an open-source event streaming platform and is currently used by many companies.
Question: Why use Kafka within EFK monitoring?
Answer: Well this is the first question that strikes many minds hence, in this blog we’ll focus on why to use Kafka, what are its benefits and how to integrate it with the EFK stack.
Interesting right? 🙂 let’s begin -:

So, while traveling we’ve seen crossroads enabled by Traffic lights or Traffic policemen to streamline the traffic, as over crossroads traffic from 4 directions meet.
So, what do Traffic lights or policemen do? They streamline the traffic by allowing one-way traffic and at the same time stopping in other directions while they wait for their turn.
Talking in technical terms in the above scenario the incoming traffic is streamlined by withholding it for some time or say creating a small buffer, isn’t it?
Kafka also does similar things, imagine approx 300 applications sending logs directly to Elasticsearch which as a result may choke it and during traffic, time scaling up elasticsearch or adding more data nodes isn’t a good solution as it becomes unstable due to re-sharding.
Introducing kafka, breaks this incoming traffic as it acts as a buffer and sends streamlined chunks to Elasticsearch.
Let’s understand this with a Block Diagram-:

Not to worry, I’ll explain each and every block with configurations 🙂
Block 1 -: This block refers to the containers or instances within which application logs are populated along with the td-agent service running (to export the desired log path to Kafka). Td-agent is a stable distribution package of fluentd maintained by Treasure Data and Cloud Native computing foundation. Basically, it’s a data collection daemon. It collects logs from various data sources (in our case application) and uploads/exports them to treasure data.
Installation guide
Within the td-agent conf below configuration is done
<source> @type tail read_from_head true path <path_of_log_file> tag <tag_name> format json keep_time_key true time_format <time_format_of_logs> pos_file < pos_file_location > </source> <match <tag_name> > @type kafka_buffered output_include_tag true brokers <kafka_hostname:port> default_topic <kafka_topic_name> output_data_type json buffer_type file buffer_path <buffer_path_location> buffer_chunk_limit 10m buffer_queue_limit 256 buffer_queue_full_action drop_oldest_chunk </match>
The <source> block is dedicated to log configurations, like -:
path – Path of log file location
tag – Tag name for logs, its user defined
format – logs format e.g. json, text etc.
Keep_time_key – Time key to keep from logs e.g. True or false
Time_format – Time format of logs e.g. %d/%b/%Y:%H:%M:%S
Pos_file – Position file location, user-defined
Similarly, <match> block is dedicated to destination i.e. where to send these logs -:
@type – Type of buffer, e.g kafka or elasticsearch
Output_include_tag – tag_name include as mentioned in source block
Brokers – Kafka dns name with port
default_topic – Kafka Topic into which logs will be exported
output_data_type – Output logs format e.g. json
Buffer_type – Buffer type e.g. file
buffer_path – Path of buffer file, user-defined
Buffer_chunk_limit – Chunk limit of buffer
buffer_queue_limit – Queue limit of buffer
Buffer_queue_full_action – To keep buffer file rotation
For more config. parameters please refer – link
Block 2 -: Is Kafka server, where kafka service is setup
Kafka uses zookeeper for self-balancing, depending upon your infra zookeeper can be on the same server or separate (separate in case of production setup)
wget http://mirror.fibergrid.in/apache/kafka/0.10.2.0/kafka_2.12-0.10.2.0.tgz tar -xzf kafka_2.12-0.10.2.0.tgz
Starting Zookeeper -:
Zookeeper needs to be started first. To do the same a convenience script comes in handy with the Kafka package to start the zookeeper single node standalone instance and further configurations need to be added within zookeeper.properties file.
vi .bashrc export KAFKA_HEAP_OPTS="-Xmx500M -Xms500M"
The value needs to be 50% of the total RAM on the instance.
source .bashrc
Start Zookeeper by the following command in the background using nohup and divert its logs in zookeeper-logs file.
cd kafka_2.12-0.10.2.0 nohup bin/zookeeper-server-start.sh config/zookeeper.properties > ~/zookeeper-logs &
Starting Kafka -:
cd kafka_2.12-0.10.2.0 nohup bin/kafka-server-start.sh config/server.properties > ~/kafka-logs &
To stop any of them use below commands -:
bin/kafka-server-stop.sh bin/zookeeper-server-stop.sh
Refer to official documentation
Block 3 -: td-agent, used as forwarder
Now, we’ve logs within kafka topics. But we need a mechanism to pull these logs and further export it to Elasticsearch
So, as td-agent was used to pick application logs and send them to kafka same way here td-agent will be configured as forwarders i.e. to pull logs from kafka and send it to Elasticsearch
<source> @type kafka_group brokers <kafka_dns:port> consumer_group <consumer_group_kafka> topics <kafka_topic_name> </source> <match <kafka_topic_name> > @type forest subtype elasticsearch <template> host <ElasticSearch IP> port <Elasticsearch Port> user <ES_username> password <ES_password> logstash_prefix <prefix name> logstash_format true include_tag_key true tag_key tag_name </template> </match>
Again, the source and match block will be updated with similar values as stated before.
This time, the source will be configured to take logs from kafka and match to forward them to Elasticsearch
Consumer_group – It’s a group of consumers that share the same group id. When a topic is consumed by consumers in the same group, every record will be delivered to only one consumer.
forest – Creates sub-plugin instance of an output plugin dynamically per tag, from template configurations
Logstash_prefix – Will be index_name to which logs will be sent and viewed inside kibana
Block 4 -: Elasticsearch
Setup can be done following below official document from Elasticsearch
Elasticsearch setup over ubuntu
Block 5: Kibana setup
You can configure nginx to make kibana available over port 80 or 443
Refer link for Nginx configuration.
So, yes now the entire EFK stack is set up with Kafka, and in the same way, it can be configured over standalone mode ( for self-learning) or over different servers for production setup.
NOTE: – The Elasticsearch and kibana setup is the same only td-agent (collector & forwarder) and Kafka configuration is where the magic happens.
Happy Learning …
Blog Pundits: Naveen Verma and Sandeep Rawat
OpsTree is an End-to-End DevOps Solution Provider.
Connect with Us