Malay Hazarika · 4 minutes read · September 13, 2024
Zuck said, "Move fast and break things". But you gotta know when and how things broke for you to take action on it.
You can not do that without reliable monitoring. Imagine this: you’ve just rolled out a much-anticipated feature, and initial user feedback is overwhelmingly positive. But then, suddenly, your application crashes under the load, leaving customers frustrated and searching for alternatives. Such downtime not only costs you money but also the brand value you have painstakingly created.
In this blog post, we’ll explore how to set up monitoring with Grafana and Prometheus in Kubernetes, allowing your startup to maintain speed, break things in the process and know as soon as it happens.
Prometheus would be at the heart of the system. Grafana will sit on top of Prometheus for you to build dashboards. The whole thing will run on Kubernetes.
This article also assumes you are running your apps on Kubernetes. In later installments, we will cover how to monitor applications that are running outside Kubernetes environments.
Prometheus + Grafana is the gold standard for monitoring. Prometheus is open-source with a large community, providing all the resources and support you'd need.
Grafana is so popular that you'll be able to find prebuild dashboards for standard components, saving you ton of engineering effort.
Ready to get started? Let’s set up Prometheus using the Prometheus Operator and Helm charts for efficient deployment. If you haven’t installed Helm yet, you can follow the official Helm installation guide.
We will be using the Prometheus operator to manage our Prometheus installation. Prometheus operator also comes with a few useful CRDs that will be helpful as we extend our setup in the future.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prom-operator prometheus-community/kube-prometheus-stack --namespace=observability --create-namespace --set grafana.enabled=false
Note: We are not installing the Grafana that comes bundled with the operator, we will install it separately.
Using the following manifest create a Prometheus instance
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: my-prometheus
namespace: observability
spec:
retention: 7d
shards: 1
replicas: 1
scrapeConfigSelector:
matchLabels:
prometheus: my-prometheus
storage:
volumeClaimTemplate:
spec:
storageClassName: <storage-class>
accessModes:
- ReadWriteOnce
resources:
requests:
storage: <size>
ServiceMonitor
to monitor your servicesServiceMonitor
is a custom resource definition(CRD) that allows Prometheus to discover a service to scrape metrics from. You can have a service monitor for every service or you can have a default monitor that matches all the services exposed.
Following a common way to match all the services running in your k8s cluster
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: default-service-monitor
namespace: observability
spec:
endpoints:
- interval: 10s
path: /metrics
port: metrics
scheme: http
jobLabel: app.kubernetes.io/name
namespaceSelector:
any: true
sampleLimit: 1000
selector:
selector:
matchLabels:
monitoring: prometheus
This ServiceMonitor
will allow Prometheus to discover all services with the label monitoring: prometheus
. Create this ServiceMonitor
and update the services to include the label and these services will be scraped for metrics on the defined intervals.
Monitoring your services is crucial, but don’t overlook the infrastructure itself. Essential metrics to track include CPU usage, memory consumption, and Disk I/O. With the kube-prometheus-stack
we have installed a component called node-exporter
which will make all the essential metrics available to Prometheus.
Install Grafana as follows
helm repo add grafana https://grafana.github.io/helm-charts
helm install grafana grafana/grafana -f values.yaml
The values.yaml
file will be as follows
persistence:
enabled: true
storageClassName: <storage-class>
size: <size>
admin:
existingSecret: "observability-grafana-admin-credentials"
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://my-prometheus.observability.svc.cluster.local:9090
Note:
observability-grafana-admin-credentials
that will contain the keys admin-user
and admin-password
How to build dashboards in Grafana is beyond the scope of this article. In upcoming blog posts, we will dive deeper into that topic.
For now, here is how to get started with Grafana
http://<your-grafana-url>:3000
).When building your dashboard, align your metrics with business objectives. For instance, tracking user engagement metrics alongside response times can provide actionable insights. To measure the success of your dashboards, gather feedback from your team to see how well the dashboard aids in decision-making. Be sure to avoid clutter—clarity is key. Use consistent color coding and clear labels to enhance usability.
Implementing effective monitoring solutions like Grafana and Prometheus is essential for the success of your startup. By addressing performance issues proactively and optimizing your infrastructure, you’ll create a reliable experience for your customers.
So, what’s next? Stay tuned for more such articles from us. Explore the documentation for Prometheus and Grafana for more advanced configurations, and consider joining community forums for additional support. Your journey towards enhanced operational efficiency starts now—let’s get monitoring!