Azure HA Kubernetes Monitoring using Prometheus and Thanos

Introduction

Long since Prometheus took on the role of monitoring the systems, it has been the undisputed open-source leader for monitoring and alerting in Kubernetes systems, it has become a go-to solution. While Prometheus does some general instructions for achieving high availability but it has limitations when it comes to data retention, historical data retrieval, and multi-tenancy. This is where Thanos comes into play. In this blog post, we will discuss how to integrate Thanos with Prometheus in Kubernetes environments and why one should choose a particular approach. So let’s get started.

Why Thanos

The deployments in Prometheus are based on persistent volumes, which are scaled using federated set-ups. Federated methodologies are not applicable to all types of data. You will often need a different tool to manage Prometheus configurations. In this section, we will be using Thanos to handle these issues. Prometheus can be run on multiple instances in Thanos, data can be deduplicated, and data can be archived in long-term storage via GCS, Azure account storage, or S3.

Thanos Architecture

Thanos Architecture

Thanos Components

  1. Thanos Sidecar
  2. Thanos Store
  3. Thanos Query
  4. Thanos Compact
  5. Thanos Ruler

Thanos Implementation

We’ll need an Azure Storage Account, you can create one using the Azure Portal or the Azure CLI. You will also need the storage account access key which can also be retrieved using the Azure CLI.

  1. (a) Create a storage account.
az storage account create --name <name> --resource-group <resource-group>
  1. (b) Create a storage container called metrics.
az storage container create — name metrics — account-name <name>
  1. (c) Retrieve the storage account access key for later use.
az storage account keys list --account-name <name> --resource-group <resource-group> -o tsv --query "[0].value"

2. Implementing Ingress Controller and Ingress objects (We will use Nginx Ingress Controller)

3. Creating credentials to be used by Thanos components to access object store (in this case, Azure bucket)

a. Create a Kubernetes secret using the credentials, as you can see in the following snippet:

kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos.yaml

Deployment

Create a monitoring namespace, service accounts, cluster role, and cluster role bindings for Prometheus using the following manifest.

kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/sa.yaml

Creating the Prometheus Configuration configmap

This configmap creates the Prometheus configuration file template that is read by the Thanos sidecar component. The template will also generate the actual configuration file. The file will be consumed by the Prometheus container running in the same pod. It is important to include the external_labels section in the configuration file so that the querier can use it to deduplicate data.

kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/prometheus-configmap.yaml

Creating Prometheus Rules configmap

This will create alert rules that will be relayed to Alertmanager for delivery.

kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/prometheus-rules.yaml

Creating Prometheus Stateful Set

kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/prometheus-deployment.yaml

It is important to understand the following about the above manifest:

  1. Prometheus is deployed as a stateful set with three replicas. Each replica provisions its own persistent volume dynamically.
  2. Prometheus configuration is generated by the Thanos Sidecar container using the template file created above.
  3. Thanos handles data compaction and therefore we need to set — storage.tsdb.min-block-duration=2h and — storage.tsdb.max-block-duration=2h
  4. Prometheus stateful set is labeled as thanos-store-api: “true” so that each pod gets discovered by the headless service. This headless service will be used by Thanos Query to query data across all the Prometheus instances.
  5. We apply the same label to the Thanos Store and Thanos Ruler components so that they are also discovered by the querier and can be used for querying metrics.
  6. The Azure bucket credentials path is provided using the AZURE_APPLICATION_CREDENTIALS environment variable. The configuration file is mounted to that from the secret created as a part of the prerequisites.

Creating Prometheus Services

kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/prometheus-service.yaml

Creating Thanos Query

One of the key components of Thanos deployment

  1. The container argument — store=dnssrv+thanos-store-gateway:10901 assists in discovering all the components from which metric data should be queried.
  2. Thanos-querier provides a web-based interface for running PromQL queries.. It also has the option to deduplicate data across various Prometheus clusters.
  3. Here we provide Grafana as a data source for all dashboards.
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/thanos-querier.yaml

Creating Thanos Store Gateway

This will create the store component which serves metrics from the object storage to the querier.

kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/thanos-store-gateway.yaml

Creating Thanos Compact

kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/thanos-compact.yaml

Creating Thanos Ruler

kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/thanos-ruler.yaml

Creating Alertmanager

kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/alertmanger.yaml

Creating Kube state Metrics

In order to relay some important container metrics, Kubestate metrics deployment is required. The metrics are not exposed natively by the kubelet and are not directly accessible via Prometheus.

https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/kube-state-metrics.yaml

Creating Node-exporter Daemon set

On each node, a node-exporter pod runs the daemon set. It exposes important metrics that can be retrieved by Prometheus instances.

kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/node-exporter.yaml

Deploying Grafana

We will create our Grafana deployment and Service that will be exposed by means of our ingress object. We will add thanos-querier as the datasource for our Grafana deployment. In order to do so:

  1. Click on Add DataSource
  2. Set Name: DS_PROMETHEUS
  3. Set Type: Prometheus
  4. Set URL: http://thanos-querier:9090
  5. Save and Test. You can now build your custom dashboards or simply import dashboards from grafana.net. Dashboards #315 and #1471 are very good places to start.
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/grafana.yaml

Creating the Ingress Object

Here is the final component of the journey. As a result, we will be able to access all of our services outside of the Kubernetes cluster.

Your domain name should be changed to <yourdomain>.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: monitoring-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: grafana.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: grafana
          servicePort: 3000
  - host: prometheus-0.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus-0-service
          servicePort: 8080
  - host: prometheus-1.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus-1-service
          servicePort: 8080
  - host: prometheus-2.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus-2-service
          servicePort: 8080
  - host: alertmanager.<yourdomain>.com
    http: 
      paths:
      - path: /
        backend:
          serviceName: alertmanager
          servicePort: 9093
  - host: thanos-querier.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: thanos-querier
          servicePort: 9090
  - host: thanos-ruler.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: thanos-ruler
          servicePort: 9090

Access Thanos Querier at http://thanos-querier.<yourdomain&gt;.com

Make sure deduplication is selected.

If you click on Stores, you will be able to see all the active endpoints discovered by thanos-store-gateway.

Grafana Dashboards

Thanos Querier is then added as the data source in Grafana, and you can start creating dashboards.

Kubernetes Cluster Monitoring Dashboard:

Conclusion

So in this blog, we have seen the limitations of the Prometheus High availability and how we tried to come up with a solution to overcome those limitations by using Thanos.

If you guys have any other ideas or suggestions around the approach, please comment in the comment section. Thanks for reading, I’d really appreciate your suggestions and feedback.

References:- 
https://thanos.io/


Blog Pundit:  Naveen Verma, Sanjeev Pandey and Sandeep Rawat

Opstree is an End to End DevOps solution provider

Connect Us

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: