Setting up monitoring with Grafana and Prometheus in Kubernetes: A practical guide

Malay Hazarika · 4 minutes read · September 13, 2024


Setting up monitoring with Grafana and Prometheus in Kubernetes: A practical guide

Introduction: The importance of monitoring for startups

Zuck said, "Move fast and break things". But you gotta know when and how things broke for you to take action on it.

You can not do that without reliable monitoring. Imagine this: you’ve just rolled out a much-anticipated feature, and initial user feedback is overwhelmingly positive. But then, suddenly, your application crashes under the load, leaving customers frustrated and searching for alternatives. Such downtime not only costs you money but also the brand value you have painstakingly created.

In this blog post, we’ll explore how to set up monitoring with Grafana and Prometheus in Kubernetes, allowing your startup to maintain speed, break things in the process and know as soon as it happens.

The setup

Prometheus would be at the heart of the system. Grafana will sit on top of Prometheus for you to build dashboards. The whole thing will run on Kubernetes.

This article also assumes you are running your apps on Kubernetes. In later installments, we will cover how to monitor applications that are running outside Kubernetes environments.

Why Prometheus and Grafana

Prometheus + Grafana is the gold standard for monitoring. Prometheus is open-source with a large community, providing all the resources and support you'd need.

Grafana is so popular that you'll be able to find prebuild dashboards for standard components, saving you ton of engineering effort.

How to set up Prometheus using the Prometheus operator in Kubernetes

Ready to get started? Let’s set up Prometheus using the Prometheus Operator and Helm charts for efficient deployment. If you haven’t installed Helm yet, you can follow the official Helm installation guide.

Install the Prometheus Operator

We will be using the Prometheus operator to manage our Prometheus installation. Prometheus operator also comes with a few useful CRDs that will be helpful as we extend our setup in the future.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prom-operator prometheus-community/kube-prometheus-stack --namespace=observability --create-namespace --set grafana.enabled=false

Note: We are not installing the Grafana that comes bundled with the operator, we will install it separately.

Create Prometheus instance

Using the following manifest create a Prometheus instance

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: my-prometheus
  namespace: observability
spec:
  retention: 7d
  shards: 1
  replicas: 1
  scrapeConfigSelector:
    matchLabels:
      prometheus: my-prometheus
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: <storage-class>
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: <size>

Create ServiceMonitor to monitor your services

ServiceMonitor is a custom resource definition(CRD) that allows Prometheus to discover a service to scrape metrics from. You can have a service monitor for every service or you can have a default monitor that matches all the services exposed.

Following a common way to match all the services running in your k8s cluster

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: default-service-monitor
  namespace: observability
spec:
  endpoints:
    - interval: 10s
      path: /metrics
      port: metrics
      scheme: http
  jobLabel: app.kubernetes.io/name
  namespaceSelector:
    any: true
  sampleLimit: 1000
  selector:
    selector:
      matchLabels:
        monitoring: prometheus

This ServiceMonitor will allow Prometheus to discover all services with the label monitoring: prometheus . Create this ServiceMonitor and update the services to include the label and these services will be scraped for metrics on the defined intervals.

Monitor your infrastructure

Monitoring your services is crucial, but don’t overlook the infrastructure itself. Essential metrics to track include CPU usage, memory consumption, and Disk I/O. With the kube-prometheus-stack we have installed a component called node-exporter which will make all the essential metrics available to Prometheus.

Setup Grafana

Install Grafana as follows

helm repo add grafana https://grafana.github.io/helm-charts
helm install grafana grafana/grafana -f values.yaml

The values.yaml file will be as follows

persistence:
  enabled: true
  storageClassName: <storage-class>
  size: <size>
admin:
  existingSecret: "observability-grafana-admin-credentials"
datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
      - name: Prometheus
        type: prometheus
        url: http://my-prometheus.observability.svc.cluster.local:9090

Note:

  • You have to create a secret named observability-grafana-admin-credentials that will contain the keys admin-user and admin-password
  • The URL points to the Prometheus instance you just created above.

Tips on building dashboards

How to build dashboards in Grafana is beyond the scope of this article. In upcoming blog posts, we will dive deeper into that topic.

For now, here is how to get started with Grafana

  1. Log into Grafana (usually found at http://<your-grafana-url>:3000).
  2. Add a Data Source: Choose Prometheus as the data source.
  3. Create a New Dashboard: Click on “+” and select “Dashboard.”

When building your dashboard, align your metrics with business objectives. For instance, tracking user engagement metrics alongside response times can provide actionable insights. To measure the success of your dashboards, gather feedback from your team to see how well the dashboard aids in decision-making. Be sure to avoid clutter—clarity is key. Use consistent color coding and clear labels to enhance usability.

Conclusion

Implementing effective monitoring solutions like Grafana and Prometheus is essential for the success of your startup. By addressing performance issues proactively and optimizing your infrastructure, you’ll create a reliable experience for your customers.

So, what’s next? Stay tuned for more such articles from us. Explore the documentation for Prometheus and Grafana for more advanced configurations, and consider joining community forums for additional support. Your journey towards enhanced operational efficiency starts now—let’s get monitoring!