Long since Prometheus took on the role of monitoring the systems, it has been the undisputed open-source leader for monitoring and alerting in Kubernetes systems, it has become a go-to solution. While Prometheus does some general instructions for achieving high availability but it has limitations when it comes to data retention, historical data retrieval, and multi-tenancy. This is where Thanos comes into play. In this blog post, we will discuss how to integrate Thanos with Prometheus in Kubernetes environments and why one should choose a particular approach. So let’s get started.
The deployments in Prometheus are based on persistent volumes, which are scaled using federated set-ups. Federated methodologies are not applicable to all types of data. You will often need a different tool to manage Prometheus configurations. In this section, we will be using Thanos to handle these issues. Prometheus can be run on multiple instances in Thanos, data can be deduplicated, and data can be archived in long-term storage via GCS, Azure account storage, or S3.
- Thanos Sidecar
- Thanos Store
- Thanos Query
- Thanos Compact
- Thanos Ruler
- (a) Create a storage account.
az storage account create --name <name> --resource-group <resource-group>
- (b) Create a storage container called
az storage container create — name metrics — account-name <name>
- (c) Retrieve the storage account access key for later use.
az storage account keys list --account-name <name> --resource-group <resource-group> -o tsv --query ".value"
2. Implementing Ingress Controller and Ingress objects (We will use Nginx Ingress Controller)
3. Creating credentials to be used by Thanos components to access object store (in this case, Azure bucket)
a. Create a Kubernetes secret using the credentials, as you can see in the following snippet:
kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos.yaml
Create a monitoring namespace, service accounts, cluster role, and cluster role bindings for Prometheus using the following manifest.
Creating the Prometheus Configuration configmap
This configmap creates the Prometheus configuration file template that is read by the Thanos sidecar component. The template will also generate the actual configuration file. The file will be consumed by the Prometheus container running in the same pod. It is important to include the external_labels section in the configuration file so that the querier can use it to deduplicate data.
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/prometheus-configmap.yaml
Creating Prometheus Rules configmap
This will create alert rules that will be relayed to Alertmanager for delivery.
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/prometheus-rules.yaml
Creating Prometheus Stateful Set
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/prometheus-deployment.yaml
It is important to understand the following about the above manifest:
- Prometheus is deployed as a stateful set with three replicas. Each replica provisions its own persistent volume dynamically.
- Prometheus configuration is generated by the Thanos Sidecar container using the template file created above.
- Thanos handles data compaction and therefore we need to set — storage.tsdb.min-block-duration=2h and — storage.tsdb.max-block-duration=2h
- Prometheus stateful set is labeled as thanos-store-api: “true” so that each pod gets discovered by the headless service. This headless service will be used by Thanos Query to query data across all the Prometheus instances.
- We apply the same label to the Thanos Store and Thanos Ruler components so that they are also discovered by the querier and can be used for querying metrics.
- The Azure bucket credentials path is provided using the AZURE_APPLICATION_CREDENTIALS environment variable. The configuration file is mounted to that from the secret created as a part of the prerequisites.
Creating Prometheus Services
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/prometheus-service.yaml
Creating Thanos Query
One of the key components of Thanos deployment
- The container argument — store=dnssrv+thanos-store-gateway:10901 assists in discovering all the components from which metric data should be queried.
- Thanos-querier provides a web-based interface for running PromQL queries.. It also has the option to deduplicate data across various Prometheus clusters.
- Here we provide Grafana as a data source for all dashboards.
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/thanos-querier.yaml
Creating Thanos Store Gateway
This will create the store component which serves metrics from the object storage to the querier.
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/thanos-store-gateway.yaml
Creating Thanos Compact
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/thanos-compact.yaml
Creating Thanos Ruler
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/thanos-ruler.yaml
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/alertmanger.yaml
Creating Kube state Metrics
In order to relay some important container metrics, Kubestate metrics deployment is required. The metrics are not exposed natively by the kubelet and are not directly accessible via Prometheus.
Creating Node-exporter Daemon set
On each node, a node-exporter pod runs the daemon set. It exposes important metrics that can be retrieved by Prometheus instances.
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/node-exporter.yaml
We will create our Grafana deployment and Service that will be exposed by means of our ingress object. We will add thanos-querier as the datasource for our Grafana deployment. In order to do so:
- Click on Add DataSource
- Set Name: DS_PROMETHEUS
- Set Type: Prometheus
- Set URL: http://thanos-querier:9090
- Save and Test. You can now build your custom dashboards or simply import dashboards from grafana.net. Dashboards #315 and #1471 are very good places to start.
kubectl apply -f https://raw.githubusercontent.com/v5arcus/azure-ha-prometheus-thanos/main/grafana.yaml
Creating the Ingress Object
Here is the final component of the journey. As a result, we will be able to access all of our services outside of the Kubernetes cluster.
Your domain name should be changed to <yourdomain>.
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: monitoring-ingress namespace: monitoring annotations: kubernetes.io/ingress.class: "nginx" spec: rules: - host: grafana.<yourdomain>.com http: paths: - path: / backend: serviceName: grafana servicePort: 3000 - host: prometheus-0.<yourdomain>.com http: paths: - path: / backend: serviceName: prometheus-0-service servicePort: 8080 - host: prometheus-1.<yourdomain>.com http: paths: - path: / backend: serviceName: prometheus-1-service servicePort: 8080 - host: prometheus-2.<yourdomain>.com http: paths: - path: / backend: serviceName: prometheus-2-service servicePort: 8080 - host: alertmanager.<yourdomain>.com http: paths: - path: / backend: serviceName: alertmanager servicePort: 9093 - host: thanos-querier.<yourdomain>.com http: paths: - path: / backend: serviceName: thanos-querier servicePort: 9090 - host: thanos-ruler.<yourdomain>.com http: paths: - path: / backend: serviceName: thanos-ruler servicePort: 9090
Access Thanos Querier at http://thanos-querier.<yourdomain>.com
Make sure deduplication is selected.
If you click on Stores, you will be able to see all the active endpoints discovered by thanos-store-gateway.
Thanos Querier is then added as the data source in Grafana, and you can start creating dashboards.
Kubernetes Cluster Monitoring Dashboard:
So in this blog, we have seen the limitations of the Prometheus High availability and how we tried to come up with a solution to overcome those limitations by using Thanos.
If you guys have any other ideas or suggestions around the approach, please comment in the comment section. Thanks for reading, I’d really appreciate your suggestions and feedback.
Opstree is an End to End DevOps solution provider