Before we deep dive into the topic let’s focus on why we need this tool and why we need this feature in our toolbox. In the world of errors and bugs, we will find many errors to debug and keep our system stable. So many applications need monitoring to analyze the performance of running application but what if:
we are not getting 100% analysis
only got the handled error exceptions
our applications have some anonymous errors which weren’t tracked in our system status error code and that continuously increased the load or downtime, and many more.
Will you actually debug that kind of error? How difficult is it to identify what caused Application Crash? Some organizations have set custom status codes for similar or multiple look-alike error strings but what if they are actually not similar, and you would be like “ignore, that’s our handled one we are throwing that status code”.
In the current era, organizations demand high-quality working data, and management systems that can scale, deploy quickly, robustly, are highly available, and highly secure for any unfortunate incidents. Traditionally, applications used relational databases as the primary data stores but in today’s need for data-driven applications, developers lean towards alternative databases like NoSQL(Not Only Structured Query Language).
NoSQL databases enable speed, flexibility, and scalability in this era of growing development in the cloud. Moreover, NoSQL databases also support JSON-like documents which are commonly used formats to share data in modern web applications.
Are you searching for service discovery or a service mesh tool for a distributed environment?
Did you find any with easy installation? Not yet!! Think fast….It’s just a piece of cake.YES! NO! Calm down because I got it !!!!
A few days back we got a requirement where we had to setup multiple services on multiple servers and in a cluster mode….So now the question arises how will the services be auto discovered? how will we get to know the health check of the service? and above all how to restrict users on different services. After a lot of research, I came across a tool named as consul. But now another stumbling block arises HOW TO SETUP IT?
Your answer might be just go ahead and download the binary on every server, if that’s what you’re thinking…then STOP! Because doing it manually on plenty of servers is time-consuming and also not an efficient way. So, I thought of using a configuration management tool that is none other than ansible. Then there were roles that were already present in the market but some have the hard coded encryption key, some were not generating the bootstrap token and also they were not easy to understand. None of the roles fulfilled the requirement.
So, I thought of creating an ansible role with features like, enabling ACL and generating a bootstrap token, and an encryption key with easy-to-understand language.
In this blog, I have explained the OT-OSM consul ansible role.
Whenever in DevOps we discuss about monitoring and alerting systems we often come across the TICK Stack! What is a TICK stack? What is so special about it? Is it different from ELK Stack, Prometheus, Grafana, Cloudwatch, and NewRelic? I will try to answer all of these queries briefly but my motivation for writing this blog is the Alert Flooding issue I faced while testing my TICK stack.
Note: This blog is not about the detailed working about TICK and its setup.
What is TICK ? What is special about it?
To explain TICK, it is basically a complete collection of services provided by the InfluxData community to capture, store, stream, process, and visualize data to provide us a highly available and robust solution for monitoring and alerting. TICK is an abbreviation for :
Telgeraf – It is a very light-weighted server agent for scrapping metrics from the system it runs on, also has the capability to pull the metrics from various third-party APIs like Kafka, StatsD, etc.
InfluxDB – It is known as the heart of the TICK stack and genuinely speaking it is one of the most efficient and high-performance database stores for handling high volumes of time-series data. It is open source and uses SQL-like query language.
Long since Prometheus took on the role of monitoring the systems, it has been the undisputed open-source leader for monitoring and alerting in Kubernetes systems, it has become a go-to solution. While Prometheus does some general instructions for achieving high availability but it has limitations when it comes to data retention, historical data retrieval, and multi-tenancy. This is where Thanos comes into play. In this blog post, we will discuss how to integrate Thanos with Prometheus in Kubernetes environments and why one should choose a particular approach. So let’s get started.