How to Setup Kubernetes for Reliable Self-Hosting in 2025: A Deep Dive

The convinience of SaaS is undeniable: quick setup and managed infrastructure. However, as your company grows, you might find your SaaS bills skyrocketing, consuming a significant portion of your operational budget. You know there are powerful open source alternatives for many of the tools you pay dearly for, but the path to adopting them seems too hard. This is a common dilemma for many tech companies, especially those aiming for frugal yet robust operations.

The primary hurdles?

Complexity of Setup and Maintenance: Getting open source tools up and running correctly, and then keeping them updated and secure, can feel like a full-time job.
Ensuring Reliability and Security: Downtime isn't an option for critical tools. How do you guarantee your self-hosted solutions are as dependable and secure as their SaaS counterparts?
Ongoing Maintenance Costs: While the software itself might be free, the engineering hours required for upkeep can quickly accumulate, sometimes negating the initial savings.

If these challenges resonate with you, you're in the right place. This guide is designed to show you how to leverage the power of Kubernetes to build a reliable and cost-effective self-hosting platform for your essential open source applications in 2025. We'll get into the Kubernetes architecture for maximum reliability.

Why Kubernetes for Your Self-Hosting Journey?

While Docker Compose is great for simple setups, Kubernetes is the clear choice for robust self-hosting when reliability, scalability, and manageable complexity are key. It offers built-in resilience, automatically handling failures and ensuring applications stay available. Kubernetes scales applications efficiently, optimizing resource use and costs. Its declarative configuration and automation reduce manual effort, making your setup reproducible. With a vibrant open source ecosystem and portability across environments, Kubernetes provides flexibility and avoids vendor lock-in. It also enhances security with features like network policies and RBAC. Although the learning curve exists, the long-term benefits for reliability, scalability, and operational efficiency make Kubernetes a powerful platform for self-hosting diverse open source tools.

How you will configure Kubernetes for maximum utility, reliability and maintainability

Achieving maximum reliability in your self-hosting Kubernetes environment involves a few things. Our goal is to build a system that not only runs your open source applications effectively but also withstands failures, scales gracefully, and remains manageable in the long run.

This involves attention to detail in several key areas:

Robust and manageble provisioning: Using OpenTofu form Day-1
Resilient Networking & Durable Storage: Getting the basics right.
Smart Ingress Management Efficiently routing traffic to your services.
Automated Certificate Management: Securing communications effortlessly.
Leveraging Operators: Automating complex application lifecycle management.
Using of Helm Charts: Standardizing deployments while maintaining control.
Advanced Autoscaling (Optional but Powerful): Optimizing resource utilization and cost.

Let's dive into each of these areas.

Provisioning Your Kubernetes Cluster

While you could manually set up each component, this approach quickly becomes unmanageable, error-prone, and difficult to replicate or recover.

We recommand using OpenTofu or Terraform from day-1. Even though it is one more thing to learn, it pays off in the long run. Because your entire infrastructure is defined in code, which can be version-controlled. Plus it makes it easier to understand and manage.

Choosing cloud providers

For most teams, especially those prioritizing reliability and reduced operational overhead, a managed Kubernetes service from a cloud provider is the recommended starting point. We will talk about Elastic Kubernetes Service (EKS), but the principles apply to other cloud providers( and bare metel clusters) as well.

Networking setup

For AWS EKS, here are our key recommendations for a reliable network setup:

Availability Zones (AZs): Setup across 2 AZs, but no more than 3. No more that 3, because you want to minimize inter-AZ data transfer costs. Two AZs provide enough redundancy for most applications.
VPC CIDR: Use a /16 CIDR block for your Virtual Private Cloud (VPC) that doesn't overlap with your application network or any other connected networks. /16 provide a large enough address space for your pods.
Subnets: Create a pair of public and private subnets in each AZ. Use /19 Subnet masks.
Internet connectivity: Connect the subnets to a internet gateway and private subnets to a NAT gateway.

NAT Gateway vs NAT Instance

NAT gateway is AWS's managed NAT service. It's highly available and scalable but can be relatively expensive (around $40 per month per NAT Gateway, plus data processing charges).

For a cluster that needs to be cost effective, you can consider using a NAT instance. We found that a tiny NAT instance is good enough for self-hosting purposes, because most of your traffic will flow though the ingress controller and NAT will only be used for downloading stuff from the internet.

Storage setup

AWS offers mature storage services that integrate seamlessly with EKS via the EBS CSI (Container Storage Interface) driver. Here are our recommendations for a perfomant and reliable storage setup:

Use gp3 over gp2: gp2 is default for new EKS cluster. We creating a new storage class with gp3 as gp3 is 20% cheper and significantly faster than gp2.
Use xfs over ext4: xfs can handle bigger files and can handle more IOPS.

Here is an example of storage class you can use:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ebs-gp3
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
  type: gp3
  csi.storage.k8s.io/fstype: xfs
  iops: "3000"

Storage for bare metal clusters

If you're managing your own hardware for self-hosting, you need a robust distributed storage solution.

Here are our recommendations:

Rook-Ceph for Distributed Storage: Rook is an open source storage orchestrator for Kubernetes. Ceph is a highly scalable and reliable distributed storage system that provides block, file, and object storage. Rook-ceph is fault tolerant and has everything you'd need to run your applications. (How to setup Rook-Ceph is outside the scope of this article)
Avoid host path storage: While simple for development, using hostPath volumes directly ties your data to a specific node. If that node fails, your data is at high risk unless you have robust external backup and recovery processes. It severely limits kubernetes' ability to run your pods reliably.
Avoid NFS Servers: While NFS can provide shared storage, we generally don't recommend it as the primary backend for high-performance or I/O-intensive. It can become a single point of failure if not architected for high availability itself. It might be acceptable for less demanding workloads or specific use cases
Avoid Longhorn: Longhorn is often praised for its ease of setup compared to Rook-Ceph. However we have found it to be hard to debug and hard to get stable. So based on our experience and community feedback, we strongly recommend Rook-Ceph over Longhorn for production workloads. Again, reliability is the top priority here.

Ingress: Managing access to your hosted applications

Once your open source applications are running in Kubernetes, you need a way to expose them to the outside world (or your internal network) securely and efficiently. This is where Ingress controllers come into play. An Ingress controller is the gatekeeper for all incoming traffic to cluster, providing routing, SSL/TLS termination, load balancing, and more.

We recommand using nginx-ingress controll for this. We choose nginx ingress because it is popular, easily scallable and has been stable for a long time.

You can install the nginx ingress controller using Helm: ingress-nginx Read the official documentation for more details on how to install and configure it.

DNS setup

Once the Nginx Ingress controller is running, it will typically provision an external LoadBalancer Service. You need to point your domain names to this load balancer. Here is how to do it

Obtain the Load Balancer's Address: Run kubectl get svc -n ingress-nginx (or the namespace where you installed it). Look for the EXTERNAL-IP or HOSTNAME of the ingress-nginx-controller service.
- On AWS, this will often be a long DNS name for an ELB/NLB.
- On other platforms or bare metal, it might be an IP address.
Create a Wildcard DNS Entry (Recommended for Simplicity):
- In your domain registrar (e.g., GoDaddy, Namecheap, AWS Route 53), create a DNS entry like *.internal and map it to the load balancer's addresss
  - If the load balancer address is a name (common in AWS), create a CNAME record.
  - If the load balancer address is an IP address, create an A record.
- *.internal allows you to access your apps as grafana.internal.yourdomain.com, metabase.internal.yourdomain.com, etc. This is a great way to avoid having to create individual DNS entries for each app.

Example of an ingress resource that allows you to access Metabase instance as metabase.internal.yourdomain.com.It also specifies TLS settings, indicating that cert-manager should procure a certificate for this host and store it in metabase-tls-secret.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: "metabase"
  namespace: "metabase"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod" # For cert-manager (discussed next)
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - "metabase.internal.yourdomain.com"
    secretName: "metabase-tls-cert"
  rules:
  - host: "metabase.internal.yourdomain.com"
    http:
      paths:
      - pathType: Prefix
        path: /
        backend:
          service:
            name: metabase
            port:
              number: 3000

Automated certificate management with cert-manager

In today's web, HTTPS is non-negotiable. It encrypts traffic between your users and your applications, ensuring privacy and data integrity. For your self-hosted services exposed via Ingress, you need SSL/TLS certificates. Manually issuing, configuring, and renewing these certificates is a tedious and error-prone process, especially when managing multiple services.

cert-manager is a native Kubernetes certificate management controller. It automates issuing and renewing of SSL/TLS cerrtificate. You can integrate it with Let's Encrypt, a free and widely trusted certificate authority. This means you can get SSL/TLS certificates for your services without any cost, and cert-manager will handle the renewal process automatically.

Follow these instructions to install cert-manager on your cluster: https://cert-manager.io/docs/installation/helm/

Following is how to create a ClusterIssuer for to issue certificates from Let's Encrypt. This is a prerequisite for using cert-manager to manage your SSL/TLS certificates.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-nginx # this name is used in the Ingress resource
spec:
  acme:
    server: "https://acme-v02.api.letsencrypt.org/directory"
    email: "yourname@yourcompany.com"
    privateKeySecretRef:
      name: letsencrypt-keys
    solvers:
      - http01:
          ingress:
            ingressClassName: nginx

Using operators for application lifecycle management

Think of an Operator as a "DevOps engineer in a box" – a piece of software running in your Kubernetes cluster that encodes the operational knowledge and domain-specific expertise required to manage a particular application. Operators extend the Kubernetes API by creating Custom Resource Definitions (CRDs) that represent the application they manage. You then interact with these custom resources just like standard Kubernetes objects (kubectl get postgresclusters). The Operator continuously watches these resources and takes action to ensure the application's actual state matches the desired state you've defined in the CRD.

Why use operators?

Automated Lifecycle Management: Operators automate complex tasks like:
- Initial deployment and configuration
- Seamless upgrades and versioning
- Backup and restore procedures
- High availability (HA) setup, including replication and failover
- Scaling (both up/down and out/in)
- Health checks and self-healing
Reduced Operational Burden: By encapsulating expert knowledge, Operators free up your team from manual, repetitive, and risky operational tasks.
Best Practices Baked In: Well-designed Operators often implement the best operational practices for the specific application they manage, ensuring your deployments are secure, performant, and reliable from the start.

Never run databases in k8s without an operator

This is a golden rule for reliable self-hosting. Databases are critical, complex systems. Managing them manually in a dynamic environment like Kubernetes is a recipe for disaster (data loss, extended downtime). Always use a mature, well-supported Operator for any database you plan to self-host. At osuite we regularly use CloudNativePG for PostgreSQL, Percona XtraDB for MySQL, and MongoDB Community Operator for MongoDB,The community ppensearch operator and many more to run applications at reliably.

Finding the operators

OperatorHub.io: A good starting point to discover available Operators.
Vendor/Project Websites: Often, the official project or vendor behind an open source tool will provide or recommend an Operator.
Community Support & Maturity: Look for Operators with active development, good documentation, a strong community, and a track record of reliability in production. Check GitHub issues and stars, and look for user testimonials or case studies.

Choosing Helm charts

Throughout this guide, we've mentioned Helm for installing components like the Nginx Ingress Controller and Cert-Manager. Helm is the de facto package manager for Kubernetes. It allows you to define, install, and upgrade even the most complex Kubernetes applications as "charts." Charts are collections of files that describe a related set of Kubernetes resources.

But Not all Helm charts are created equal. Setting up essential tools with unnecessarily complicated charts is recipe for disaster. Because it limits your ability to undestand the components you are installing and makes it harder to customize and debug issues.

How to find a chart to use

Look for Official Charts: Check if the project itself maintains an official Helm chart. These are usually the most up-to-date and well-maintained.
Explore Community Charts on Github and Artifact Hub: Artifact Hub (artifacthub.io) is an excellent central repository for discovering Helm charts.
Inspect the values.yaml: This file contains the default configuration options. Understand what each option does and customize it according to your environment and security policies. Pay close attention to resource requests/limits, persistence settings, image versions, and any security-related configurations
Write your own chart: If you find a chart that is too complex or doesn't meet your needs, consider writing your own. This can be a great learning experience and will give you complete control over the deployment. (Tip: Look at a docker-compose file for the application you're deploying. It often provides a good starting point for what resources you need in Kubernetes.)

Controversial opinion: At Osuite, We don't recommand Bitnami charts. We do use them, but with caution. They are ofter over engineered even for simple use cases. If you are absolutely sure that you don't need to modify the chart later on, you can use them. But we recommend using the official charts or community charts that are simpler and easier to understand.

Autoscaling with Karpenter (Optional)

Karpenter is an open source, flexible, high-performance Kubernetes cluster autoscaler built by AWS. Unlike the traditional Cluster Autoscaler that manages EC2 Auto Scaling Groups (ASGs), Karpenter works directly with the EC2 Fleet API to provision new nodes "just-in-time" based on the aggregate resource requests of unschedulable pods.

Why use Karpenter?

Just-in-Time Node Provisioning: Karpenter observes unschedulable pods and makes optimized decisions to launch new nodes that precisely fit the pods' requirements (CPU, memory, architecture, GPU, etc.). This is often faster and more efficient than waiting for an ASG to scale up.
Improved Bin Packing & Resource Utilization: By launching nodes tailored to pending workloads, Karpenter can achieve better bin packing, leading to higher resource utilization and potentially lower costs.
Excellent Spot Instance Integration: This is a major advantage for cost optimization in your self-hosting strategy. Karpenter excels at leveraging EC2 Spot Instances, which can offer significant savings (up to 72% off On-Demand prices) for fault-tolerant workloads.
- Karpenter can be configured to provision Spot Instances for "durable" workloads like web servers or stateless applications that can handle interruptions.
- It intelligently handles Spot Instance interruptions by cordoning and draining nodes before termination, allowing Kubernetes to reschedule pods gracefully. This contributes to the reliability of workloads running on Spot. It never feels like you are using spot instances because the pod get rescheduled before the the ec2 instance is terminated.

Why not use Karpenter?

While powerful, Karpenter introduces another layer of complexity to your Kubernetes architecture.

Day 1 Optionality: We don't recommend implementing Karpenter on day one of your self-hosting journey unless you have a clear, immediate need for its advanced capabilities (especially aggressive Spot Instance usage). Start with the standard EKS managed node groups and the Cluster Autoscaler if needed. Get your core applications running reliably first.
Setup Complexity: Setting up Karpenter involves configuring IAM roles, instance profiles, security groups, and Karpenter's own Provisioner CRDs. The documentation is okay, but lacks some clarity around how to set up AWS resources with Terraform.

When to use Karpenter?

Unless you meet few of the following conditions, we recommend starting with the standard nodes.

You are running on AWS EKS and want more granular control over node provisioning.
You want to aggressively optimize costs by leveraging Spot Instances for a significant portion of your workloads.
You have diverse workload requirements (different CPU/memory ratios, GPUs, etc.) and find managing many ASGs cumbersome.
You need faster node scale-up times than what the standard Cluster Autoscaler typically provides

Conclusion

This guide should give you a solid blue print to start your self hosting journey. You don't need to do all of it on the day-1. Start with the basics and get a few applications running and evolve your setup over time.

Happy self-hosting!

Setup Kubernetes to reliably self host open source tools