Deploying Prometheus Multi-Cluster monitoring using Prometheus Agent Mode

Deploying Prometheus Multi-Cluster monitoring using Prometheus Agent Mode

In the previous post I wrote about Prometheus Multi-cluster monitoring and how using Prometheus in agent mode helps create a single pane of glass for monitoring multi Kubernetes clusters. Therefore, if you haven’t read it yet, please read it before this hands-on post. In this post, we are going to deploy Prometheus agent mode along with a Prometheus global view and test how these work together in action. For this tutorial, you need a Kubernetes Cluster and two separate namespaces, monitoring-global, and monitoring. Are you ready to run?!

Photo by Andrea Piacquadio from Pexels

Deploy the Global view Prometheus

First make sure that you have already created both needed namespaces. otherwise, create them with the following command:

kubectl create ns monitoring-global && kubectl create ns monitoring

All the files I am using to deploy on the test cluster are available to download or clone on my Github.

1- Deploy a config map to be used in the Prometheus deployment. This config map creates prometheus.rules which defines rule statements and prometheus.yml which is the configuration file.

kubectl apply -f prometheus-global-view/config-map-global.yml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-global-conf
  labels:
    name: prometheus-global-conf
  namespace: monitoring-global
data:
  prometheus.rules: |-
    groups:
    - name: devopscube demo alert
      rules:
      - alert: High Pod Memory
        expr: sum(container_memory_usage_bytes) > 1
        for: 1m
        labels:
          severity: slack
        annotations:
          summary: High Memory Usage
  prometheus.yml: |-
    global:
      scrape_interval: 5s
      evaluation_interval: 5s
    rule_files:
      - /etc/prometheus/prometheus.rules
    alerting:
      alertmanagers:
      - scheme: http
        static_configs:
        - targets:
          - "alertmanager.monitoring.svc:9093"
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
          - targets: ['127.0.0.1:9091']

2- Deploy the Prometheus deployment. Since it is only for tests and not that much data is going to be stored here, we add an emptyDirVolume and also we enable the remote-write-receiver which allows Prometheus to accept remote write requests from other Prometheus servers.

kubectl apply -f prometheus-global-view/prometheus-global-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-deployment
  namespace: monitoring-global
  labels:
    app: prometheus-global
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-global
  template:
    metadata:
      labels:
        app: prometheus-global
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus
          args:
            - "--storage.tsdb.retention.time=12h"
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus/"
            - "--web.enable-remote-write-receiver"
            - "--web.enable-lifecycle"
          ports:
            - containerPort: 9090
          resources:
            requests:
              cpu: 500m
              memory: 500M
            limits:
              cpu: 1
              memory: 1Gi
          volumeMounts:
            - name: prometheus-global-config-volume
              mountPath: /etc/prometheus/
            - name: prometheus-storage-volume
              mountPath: /prometheus/
      volumes:
        - name: prometheus-global-config-volume
          configMap:
            defaultMode: 420
            name: prometheus-global-conf
        - name: prometheus-storage-volume
          emptyDir: {}

If you get the running pods, you see the created pod.

$kubectl get pods -n monitoring-global                                                                                                                                                                          
NAME                                     READY   STATUS    RESTARTS   AGE  
prometheus-deployment-6d84cb9b8b-5r2zb   1/1     Running   0          2m35s

3- It is better to create a headless service to use as the remote-write endpoint in the agent-mode namespace by the agent-mode Prometheus.

kubectl apply -f prometheus-global-view/headless-service.yml
apiVersion: v1
kind: Service
metadata:
  name: prometheus-global-headless-service
  namespace: monitoring-global
spec:
  clusterIP: None
  selector:
    app: prometheus-global
  ports:
    - protocol: TCP
      port: 9090
      targetPort: 9090

Now that headless-service is created, you can simply forward a local port to it and access the running Prometheus global view.

$kubectl port-forward svc/prometheus-global-headless-service 9090:9090 -n monitoring-global                                                                                                                  
Forwarding from 127.0.0.1:9090 -> 9090  
Forwarding from \[::1\]:9090 -> 9090

Call the health check endpoint to make sure that everything is working. Or you can browse [http://localhost:9090](http://localhost:9090) on your browser.

$curl [http://localhost:9090/-/healthy](http://localhost:9090/-/healthy) Prometheus Server is Healthy.

Deploy the Agent-mode Prometheus

Now it is time to deploy Prometheus in the agent mode and remote-write the metrics to the global-view one.

1- Create a ClusterRole and a ClusterRoleBinding for Prometheus to be able to scrape some Kubernetes metrics.

kubectl apply -f prometheus-agent/clusterrole.yml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: default
  namespace: monitoring

2- We need to create a config map to be used as the agent-mode Prometheus server configuration file. Here we are adding the remote_write endpoint.

kubectl apply -f prometheus-agent/config-map.yml

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-server-conf
  labels:
    name: prometheus-server-conf
  namespace: monitoring
data:
  prometheus.yml: |-
    global:
      scrape_interval: 5s
      evaluation_interval: 5s
    scrape_configs:
      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https
    remote_write:
    - url: 'http://prometheus-global-headless-service.monitoring-global.svc.cluster.local:9090/api/v1/write'

3- Now we deployed all the prerequisites for the agent-mode deployment, finally, we can deploy the agent mode Prometheus. We are enabling the agent mode installation by passing the --enable-feature=agent argument.

kubectl apply -f prometheus-agent/prometheus-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-deployment
  namespace: monitoring
  labels:
    app: prometheus-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-server
  template:
    metadata:
      labels:
        app: prometheus-server
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--web.enable-lifecycle"
            - "--enable-feature=agent"
          ports:
            - containerPort: 9090
          resources:
            requests:
              cpu: 500m
              memory: 500M
            limits:
              cpu: 1
              memory: 1Gi
          volumeMounts:
            - name: prometheus-config-volume
              mountPath: /etc/prometheus/
            - name: prometheus-storage-volume
              mountPath: /prometheus/
      volumes:
        - name: prometheus-config-volume
          configMap:
            defaultMode: 420
            name: prometheus-server-conf
        - name: prometheus-storage-volume
          emptyDir: {}

By deploying the last piece, now the Prometheus agent mode is running and you can confirm it by forwarding its pod port to your localhost.

$kubectl get pods -n monitoring                                                                                                                                                                                             
NAME                                    READY   STATUS    RESTARTS   AGE  
prometheus-deployment-fd7f6557c-tvsjj   1/1     Running   0 $kubectl port-forward prometheus-deployment-fd7f6557c-tvsjj 9080:9090 -n monitoring                                                                                                                                          
Forwarding from 127.0.0.1:9080 -> 9090  
Forwarding from \[::1\]:9080 -> 9090

And now if you browse localhost:9080 on the browser you get the following page which shows that the Prometheus is running in the agent mode.

And now, if you port forward the global view again and query for the available metrics, you will see metrics shipped from the agent one accessible on it.

Conclusion

Prometheus in agent mode is useful when you want to monitor multiple clusters in a single pane of glass. But it is not only the case, these days many companies are moving forward to implement edge computing. It is the era of IoT, self-driving cars, and many other models that you can deploy a Kubernetes cluster in a resource-bounded device.

Who am I?

I am Ehsan, a passionate site reliability engineer and a cloud solutions architect working for Techspire Netherlands and dedicated to assisting businesses smoothing out their System Engineering and Security Operations, improving availability, scalability, and QoS for their services using infrastructure as code concepts, a wide range of monitoring and logging tools, digging into Linux based operating systems, using my deep knowledge in Network concepts and big-data analytical tools.