Back to projects
KubernetesHelmArgoCDcert-managerMetalLBCalicoPrometheusGrafanaVagrantAnsible

Kubernetes Homelab Cluster

Production-grade multi-node K8s cluster with GitOps (ArgoCD), observability (kube-prometheus-stack), automated TLS (cert-manager), RBAC, network policies, and MetalLB

5 min read

Overview

A production-grade multi-node Kubernetes homelab cluster that simulates a real platform engineering environment. Deployed on VMs (VirtualBox/Proxmox) using kubeadm, with GitOps-managed workloads via ArgoCD, full observability with the kube-prometheus-stack, automated TLS with cert-manager, RBAC policies, network policies, and MetalLB for bare-metal LoadBalancer services.

This cluster serves as the foundation for the entire portfolio — the observability stack monitors it, the logging platform collects from it, and the IaC pipeline provisions its infrastructure.

Key Features

Cluster Infrastructure

  • Multi-node architecture — 1 control plane + 2 worker nodes via Vagrant/Proxmox
  • kubeadm bootstrap — Production-grade cluster initialization with custom config
  • Calico CNI — Network policy support for pod-to-pod traffic control
  • MetalLB — Layer 2 LoadBalancer for bare-metal environments (IP pool: 192.168.1.200-250)
  • Containerd runtime — Systemd cgroup driver, optimized configuration

GitOps with ArgoCD

  • Declarative cluster state — All components defined as ArgoCD Applications
  • Auto-sync — Applications automatically reconcile with Git state
  • Multi-project isolation — Platform components and user apps in separate ArgoCD Projects
  • Health monitoring — ArgoCD tracks deployment health and reports drift
  • Self-managed — ArgoCD manages its own configuration via Helm

Observability

  • kube-prometheus-stack — Prometheus, Grafana, Alertmanager, Node Exporter, kube-state-metrics
  • Custom alert rules — Node down, CrashLoopBackOff, OOMKilled, pending pods, high 5xx rate, cert expiry
  • Pre-built dashboards — Cluster overview, node metrics, pod resources, NGINX ingress, SLO tracking
  • Metrics-server — HPA support and
    CODE
    kubectl top
    functionality

Security

  • RBAC policies — Dev team (read + exec in dev namespace), SRE team (broad read + limited write), CI pipeline (deploy-only)
  • Network policies — Default deny all, allow DNS egress, allow ingress controller, allow monitoring scrape
  • Pod Security Admission — Restricted security level enforcement via namespace labels
  • Service accounts — Dedicated SAs for GitHub Actions CI/CD and ArgoCD

Automated TLS

  • cert-manager — Automatic certificate issuance and renewal via Let's Encrypt
  • Staging + Production issuers — Safe testing with staging before production certs
  • Ingress integration — Automatic cert provisioning for Ingress resources
  • Alert on expiry — PrometheusRule alerts when certs are < 7 days from expiry

Architecture

CODE
1┌─────────────────────────────────────────────────────────┐ 2│ METALLB (L2 Mode) │ 3│ IP Pool: 192.168.1.200-250 │ 4└──────────────────────┬──────────────────────────────────┘ 56 ┌────────┼────────┐ 7 ▼ ▼ ▼ 8 ┌─────────┐ ┌─────────┐ 9 │ NGINX │ │ NGINX │ 10 │ Ingress │ │ Ingress │ (2 replicas) 11 │ + TLS │ │ + TLS │ 12 └────┬────┘ └────┬────┘ 13 │ │ 14 ┌──────────▼───────────▼───────────┐ 15 │ KUBERNETES CLUSTER │ 16 │ │ 17 │ Control Plane Worker 1 │ 18 │ (API, etcd, (apps, │ 19 │ scheduler) monitoring) │ 20 │ │ 21 │ ArgoCD Worker 2 │ 22 │ cert-manager (apps, logs) │ 23 │ kube-prometheus │ 24 │ │ 25 │ Network Policies (Calico) │ 26 │ RBAC Policies │ 27 │ Pod Security Admission │ 28 └──────────────────────────────────┘

Technical Implementation

Vagrant VM Provisioning

Three Ubuntu 22.04 VMs are provisioned with specific resource allocations:

  • Control plane — 2 vCPU, 4GB RAM, 40GB disk
  • Worker 1 — 2 vCPU, 4GB RAM, 40GB disk
  • Worker 2 — 2 vCPU, 4GB RAM, 40GB disk
  • Private network on 192.168.56.0/24 for inter-node communication

Ansible Cluster Bootstrap

The Ansible playbook sequence:

  1. prereqs.yml — Kernel modules (br_netfilter, overlay), sysctl params, disable swap
  2. containerd.yml — Install containerd with systemd cgroup driver
  3. kubeadm-install.yml — Install kubeadm, kubelet, kubectl
  4. control-plane.yml
    CODE
    kubeadm init
    with Calico CNI, generate join token
  5. workers.yml
    CODE
    kubeadm join
    on all worker nodes
  6. post-install.yml — Copy kubeconfig, install metrics-server

Network Policy Strategy

Starting from a default-deny-all baseline, policies are layered:

  1. Default deny all ingress and egress
  2. Allow DNS egress (port 53 UDP/TCP) to kube-system
  3. Allow ingress from ingress-nginx namespace
  4. Allow Prometheus scrape from monitoring namespace
  5. Application-specific rules for inter-service communication

GitOps Application Management

All cluster components are managed as ArgoCD Applications:

YAML
1# Example: MetalLB via ArgoCD 2apiVersion: argoproj.io/v1alpha1 3kind: Application 4metadata: 5 name: metallb 6spec: 7 project: platform 8 source: 9 repoURL: https://metallb.github.io/metallb 10 chart: metallb 11 destination: 12 namespace: metallb-system 13 server: https://kubernetes.default.svc 14 syncPolicy: 15 automated: 16 prune: true 17 selfHeal: true

Deployment

Quick Start (Vagrant)

Bash
1cd infrastructure/vagrant 2vagrant up 3cd ../../ 4./scripts/bootstrap.sh

Production (Ansible)

Bash
1cd infrastructure/ansible 2ansible-playbook -i inventory/hosts.ini playbooks/prereqs.yml 3ansible-playbook -i inventory/hosts.ini playbooks/containerd.yml 4ansible-playbook -i inventory/hosts.ini playbooks/kubeadm-install.yml 5ansible-playbook -i inventory/hosts.ini playbooks/control-plane.yml 6ansible-playbook -i inventory/hosts.ini playbooks/workers.yml

Impact

  • 3-node cluster with HA-ready control plane configuration
  • 12+ ArgoCD Applications managing all cluster components declaratively
  • 4 RBAC roles enforcing least-privilege access patterns
  • 5 network policies implementing defense-in-depth
  • 6 alert rules categories covering infrastructure, pods, ingress, and certificates
  • Automated TLS with Let's Encrypt and expiry alerting
  • 3 operational runbooks for common cluster incidents

Future Plans

  • Add Cilium CNI as alternative with Hubble network visibility
  • Implement Velero for cluster backup and disaster recovery
  • Add OPA Gatekeeper for admission control policies
  • Deploy Harbor as private container registry
  • Add Cluster API for declarative cluster lifecycle management
  • Implement crossplane for cloud resource management within K8s