Overview

A production-grade multi-node Kubernetes homelab cluster that simulates a real platform engineering environment. Deployed on VMs (VirtualBox/Proxmox) using kubeadm, with GitOps-managed workloads via ArgoCD, full observability with the kube-prometheus-stack, automated TLS with cert-manager, RBAC policies, network policies, and MetalLB for bare-metal LoadBalancer services.

This cluster serves as the foundation for the entire portfolio — the observability stack monitors it, the logging platform collects from it, and the IaC pipeline provisions its infrastructure.

Key Features

Cluster Infrastructure

Multi-node architecture — 1 control plane + 2 worker nodes via Vagrant/Proxmox
kubeadm bootstrap — Production-grade cluster initialization with custom config
Calico CNI — Network policy support for pod-to-pod traffic control
MetalLB — Layer 2 LoadBalancer for bare-metal environments (IP pool: 192.168.1.200-250)
Containerd runtime — Systemd cgroup driver, optimized configuration

GitOps with ArgoCD

Declarative cluster state — All components defined as ArgoCD Applications
Auto-sync — Applications automatically reconcile with Git state
Multi-project isolation — Platform components and user apps in separate ArgoCD Projects
Health monitoring — ArgoCD tracks deployment health and reports drift
Self-managed — ArgoCD manages its own configuration via Helm

Observability

kube-prometheus-stack — Prometheus, Grafana, Alertmanager, Node Exporter, kube-state-metrics
Custom alert rules — Node down, CrashLoopBackOff, OOMKilled, pending pods, high 5xx rate, cert expiry
Pre-built dashboards — Cluster overview, node metrics, pod resources, NGINX ingress, SLO tracking
Metrics-server — HPA support and
CODE
1 line
kubectl top
functionality

Security

RBAC policies — Dev team (read + exec in dev namespace), SRE team (broad read + limited write), CI pipeline (deploy-only)
Network policies — Default deny all, allow DNS egress, allow ingress controller, allow monitoring scrape
Pod Security Admission — Restricted security level enforcement via namespace labels
Service accounts — Dedicated SAs for GitHub Actions CI/CD and ArgoCD

Automated TLS

cert-manager — Automatic certificate issuance and renewal via Let's Encrypt
Staging + Production issuers — Safe testing with staging before production certs
Ingress integration — Automatic cert provisioning for Ingress resources
Alert on expiry — PrometheusRule alerts when certs are < 7 days from expiry

Architecture


CODE
28 lines
1┌─────────────────────────────────────────────────────────┐
2│                   METALLB (L2 Mode)                      │
3│              IP Pool: 192.168.1.200-250                  │
4└──────────────────────┬──────────────────────────────────┘
5                       │
6              ┌────────┼────────┐
7              ▼        ▼        ▼
8        ┌─────────┐ ┌─────────┐
9        │ NGINX   │ │ NGINX   │
10        │ Ingress │ │ Ingress │  (2 replicas)
11        │ + TLS   │ │ + TLS   │
12        └────┬────┘ └────┬────┘
13             │           │
14  ┌──────────▼───────────▼───────────┐
15  │       KUBERNETES CLUSTER         │
16  │                                  │
17  │  Control Plane    Worker 1       │
18  │  (API, etcd,      (apps,         │
19  │   scheduler)       monitoring)   │
20  │                                  │
21  │  ArgoCD           Worker 2       │
22  │  cert-manager     (apps, logs)   │
23  │  kube-prometheus                 │
24  │                                  │
25  │  Network Policies (Calico)       │
26  │  RBAC Policies                   │
27  │  Pod Security Admission          │
28  └──────────────────────────────────┘

Technical Implementation

Vagrant VM Provisioning

Three Ubuntu 22.04 VMs are provisioned with specific resource allocations:

Control plane — 2 vCPU, 4GB RAM, 40GB disk
Worker 1 — 2 vCPU, 4GB RAM, 40GB disk
Worker 2 — 2 vCPU, 4GB RAM, 40GB disk
Private network on 192.168.56.0/24 for inter-node communication

Ansible Cluster Bootstrap

The Ansible playbook sequence:

prereqs.yml — Kernel modules (br_netfilter, overlay), sysctl params, disable swap
containerd.yml — Install containerd with systemd cgroup driver
kubeadm-install.yml — Install kubeadm, kubelet, kubectl
control-plane.yml —
CODE
1 line
kubeadm init
with Calico CNI, generate join token
workers.yml —
CODE
1 line
kubeadm join
on all worker nodes
post-install.yml — Copy kubeconfig, install metrics-server

Network Policy Strategy

Starting from a default-deny-all baseline, policies are layered:

Default deny all ingress and egress
Allow DNS egress (port 53 UDP/TCP) to kube-system
Allow ingress from ingress-nginx namespace
Allow Prometheus scrape from monitoring namespace
Application-specific rules for inter-service communication

GitOps Application Management

All cluster components are managed as ArgoCD Applications:


YAML
17 lines
1# Example: MetalLB via ArgoCD
2apiVersion: argoproj.io/v1alpha1
3kind: Application
4metadata:
5  name: metallb
6spec:
7  project: platform
8  source:
9    repoURL: https://metallb.github.io/metallb
10    chart: metallb
11  destination:
12    namespace: metallb-system
13    server: https://kubernetes.default.svc
14  syncPolicy:
15    automated:
16      prune: true
17      selfHeal: true

Deployment

Quick Start (Vagrant)


Bash
4 lines
1cd infrastructure/vagrant
2vagrant up
3cd ../../
4./scripts/bootstrap.sh

Production (Ansible)


Bash
6 lines
1cd infrastructure/ansible
2ansible-playbook -i inventory/hosts.ini playbooks/prereqs.yml
3ansible-playbook -i inventory/hosts.ini playbooks/containerd.yml
4ansible-playbook -i inventory/hosts.ini playbooks/kubeadm-install.yml
5ansible-playbook -i inventory/hosts.ini playbooks/control-plane.yml
6ansible-playbook -i inventory/hosts.ini playbooks/workers.yml

Impact

3-node cluster with HA-ready control plane configuration
12+ ArgoCD Applications managing all cluster components declaratively
4 RBAC roles enforcing least-privilege access patterns
5 network policies implementing defense-in-depth
6 alert rules categories covering infrastructure, pods, ingress, and certificates
Automated TLS with Let's Encrypt and expiry alerting
3 operational runbooks for common cluster incidents

Future Plans

Add Cilium CNI as alternative with Hubble network visibility
Implement Velero for cluster backup and disaster recovery
Add OPA Gatekeeper for admission control policies
Deploy Harbor as private container registry
Add Cluster API for declarative cluster lifecycle management
Implement crossplane for cloud resource management within K8s

Kubernetes Homelab Cluster