Kubernetes Homelab Cluster
Production-grade multi-node K8s cluster with GitOps (ArgoCD), observability (kube-prometheus-stack), automated TLS (cert-manager), RBAC, network policies, and MetalLB
Overview
A production-grade multi-node Kubernetes homelab cluster that simulates a real platform engineering environment. Deployed on VMs (VirtualBox/Proxmox) using kubeadm, with GitOps-managed workloads via ArgoCD, full observability with the kube-prometheus-stack, automated TLS with cert-manager, RBAC policies, network policies, and MetalLB for bare-metal LoadBalancer services.
This cluster serves as the foundation for the entire portfolio — the observability stack monitors it, the logging platform collects from it, and the IaC pipeline provisions its infrastructure.
Key Features
Cluster Infrastructure
- Multi-node architecture — 1 control plane + 2 worker nodes via Vagrant/Proxmox
- kubeadm bootstrap — Production-grade cluster initialization with custom config
- Calico CNI — Network policy support for pod-to-pod traffic control
- MetalLB — Layer 2 LoadBalancer for bare-metal environments (IP pool: 192.168.1.200-250)
- Containerd runtime — Systemd cgroup driver, optimized configuration
GitOps with ArgoCD
- Declarative cluster state — All components defined as ArgoCD Applications
- Auto-sync — Applications automatically reconcile with Git state
- Multi-project isolation — Platform components and user apps in separate ArgoCD Projects
- Health monitoring — ArgoCD tracks deployment health and reports drift
- Self-managed — ArgoCD manages its own configuration via Helm
Observability
- kube-prometheus-stack — Prometheus, Grafana, Alertmanager, Node Exporter, kube-state-metrics
- Custom alert rules — Node down, CrashLoopBackOff, OOMKilled, pending pods, high 5xx rate, cert expiry
- Pre-built dashboards — Cluster overview, node metrics, pod resources, NGINX ingress, SLO tracking
- Metrics-server — HPA support and functionalityCODE1 line
kubectl top
Security
- RBAC policies — Dev team (read + exec in dev namespace), SRE team (broad read + limited write), CI pipeline (deploy-only)
- Network policies — Default deny all, allow DNS egress, allow ingress controller, allow monitoring scrape
- Pod Security Admission — Restricted security level enforcement via namespace labels
- Service accounts — Dedicated SAs for GitHub Actions CI/CD and ArgoCD
Automated TLS
- cert-manager — Automatic certificate issuance and renewal via Let's Encrypt
- Staging + Production issuers — Safe testing with staging before production certs
- Ingress integration — Automatic cert provisioning for Ingress resources
- Alert on expiry — PrometheusRule alerts when certs are < 7 days from expiry
Architecture
CODE28 lines1┌─────────────────────────────────────────────────────────┐ 2│ METALLB (L2 Mode) │ 3│ IP Pool: 192.168.1.200-250 │ 4└──────────────────────┬──────────────────────────────────┘ 5 │ 6 ┌────────┼────────┐ 7 ▼ ▼ ▼ 8 ┌─────────┐ ┌─────────┐ 9 │ NGINX │ │ NGINX │ 10 │ Ingress │ │ Ingress │ (2 replicas) 11 │ + TLS │ │ + TLS │ 12 └────┬────┘ └────┬────┘ 13 │ │ 14 ┌──────────▼───────────▼───────────┐ 15 │ KUBERNETES CLUSTER │ 16 │ │ 17 │ Control Plane Worker 1 │ 18 │ (API, etcd, (apps, │ 19 │ scheduler) monitoring) │ 20 │ │ 21 │ ArgoCD Worker 2 │ 22 │ cert-manager (apps, logs) │ 23 │ kube-prometheus │ 24 │ │ 25 │ Network Policies (Calico) │ 26 │ RBAC Policies │ 27 │ Pod Security Admission │ 28 └──────────────────────────────────┘
Technical Implementation
Vagrant VM Provisioning
Three Ubuntu 22.04 VMs are provisioned with specific resource allocations:
- Control plane — 2 vCPU, 4GB RAM, 40GB disk
- Worker 1 — 2 vCPU, 4GB RAM, 40GB disk
- Worker 2 — 2 vCPU, 4GB RAM, 40GB disk
- Private network on 192.168.56.0/24 for inter-node communication
Ansible Cluster Bootstrap
The Ansible playbook sequence:
- prereqs.yml — Kernel modules (br_netfilter, overlay), sysctl params, disable swap
- containerd.yml — Install containerd with systemd cgroup driver
- kubeadm-install.yml — Install kubeadm, kubelet, kubectl
- control-plane.yml — with Calico CNI, generate join tokenCODE1 line
kubeadm init - workers.yml — on all worker nodesCODE1 line
kubeadm join - post-install.yml — Copy kubeconfig, install metrics-server
Network Policy Strategy
Starting from a default-deny-all baseline, policies are layered:
- Default deny all ingress and egress
- Allow DNS egress (port 53 UDP/TCP) to kube-system
- Allow ingress from ingress-nginx namespace
- Allow Prometheus scrape from monitoring namespace
- Application-specific rules for inter-service communication
GitOps Application Management
All cluster components are managed as ArgoCD Applications:
YAML17 lines1# Example: MetalLB via ArgoCD 2apiVersion: argoproj.io/v1alpha1 3kind: Application 4metadata: 5 name: metallb 6spec: 7 project: platform 8 source: 9 repoURL: https://metallb.github.io/metallb 10 chart: metallb 11 destination: 12 namespace: metallb-system 13 server: https://kubernetes.default.svc 14 syncPolicy: 15 automated: 16 prune: true 17 selfHeal: true
Deployment
Quick Start (Vagrant)
Bash4 lines1cd infrastructure/vagrant 2vagrant up 3cd ../../ 4./scripts/bootstrap.sh
Production (Ansible)
Bash6 lines1cd infrastructure/ansible 2ansible-playbook -i inventory/hosts.ini playbooks/prereqs.yml 3ansible-playbook -i inventory/hosts.ini playbooks/containerd.yml 4ansible-playbook -i inventory/hosts.ini playbooks/kubeadm-install.yml 5ansible-playbook -i inventory/hosts.ini playbooks/control-plane.yml 6ansible-playbook -i inventory/hosts.ini playbooks/workers.yml
Impact
- 3-node cluster with HA-ready control plane configuration
- 12+ ArgoCD Applications managing all cluster components declaratively
- 4 RBAC roles enforcing least-privilege access patterns
- 5 network policies implementing defense-in-depth
- 6 alert rules categories covering infrastructure, pods, ingress, and certificates
- Automated TLS with Let's Encrypt and expiry alerting
- 3 operational runbooks for common cluster incidents
Future Plans
- Add Cilium CNI as alternative with Hubble network visibility
- Implement Velero for cluster backup and disaster recovery
- Add OPA Gatekeeper for admission control policies
- Deploy Harbor as private container registry
- Add Cluster API for declarative cluster lifecycle management
- Implement crossplane for cloud resource management within K8s