Overview


My homelab is an infrastructure running entirely on Kubernetes, with carefully designed layers for networking, security, monitoring, and service delivery. Here's how everything fits together.


Architecture Layers


1. Infrastructure Foundation


Bare Metal Cluster:

  • 3-node Kubernetes cluster (1 control plane, 2 workers)
  • Local storage with dynamic PersistentVolume provisioning
  • Flannel CNI for pod networking
  • MetalLB for LoadBalancer service types

Network Architecture:

  • Segregated VLANs for management, services, and user traffic
  • Firewall rules controlling inter-VLAN communication
  • Internal DNS resolution for service discovery
  • External DNS via Cloudflare

2. Kubernetes Control Plane


The brain of the operation - managing workload orchestration, scheduling, and service discovery across all nodes.


Key Components:

  • etcd for distributed configuration
  • kube-apiserver as the central management interface
  • kube-scheduler for intelligent pod placement
  • kube-controller-manager for cluster state reconciliation

3. Service Mesh and Networking


Ingress Layer:

  • NGINX Ingress Controller as the gateway
  • TLS termination with Let's Encrypt certificates
  • Path-based routing to backend services
  • Rate limiting and request filtering

Service Discovery:

  • CoreDNS for internal service resolution
  • Kubernetes Services with ClusterIP/LoadBalancer
  • Headless services for StatefulSets

4. Application Services


Development & Learning:

  • AvidLearner: Go learning platform with React frontend
  • LabMan CLI: Remote homelab session management via SSH
  • Rebalancer Operator: Custom Kubernetes operator for pod optimization

Media & Content:

  • Navidrome: Self-hosted music streaming (Subsonic API)
  • Ebook Reader: Digital library with progress tracking
  • Poetry Blog: Content publishing platform

DevOps & Automation:

  • Jenkins: CI/CD orchestration with Kaniko builds
  • Trivy: Container image security scanning
  • Uptime Kuma: Service health monitoring

Observability Stack:

  • Prometheus: Metrics collection and alerting
  • Grafana: Visualization and dashboards
  • Loki: Log aggregation (planned)

5. CI/CD Pipeline Flow


The deployment pipeline is fully automated:


  1. Code Push: Developer pushes to Git repository
  2. Jenkins Trigger: Webhook triggers Jenkins pipeline
  3. Build Phase: Kaniko builds container image inside Kubernetes (no Docker daemon needed)
  4. Security Scan: Trivy scans image for CVEs and misconfigurations
  5. Registry Push: Clean images pushed to Docker Hub
  6. Helm Deploy: Helm charts deploy updated services to Kubernetes
  7. Ingress Update: NGINX Ingress routes traffic to new pods
  8. Health Check: Kubernetes readiness probes verify deployment

  9. 6. External Access Architecture


    Cloudflare Integration:

    • Cloudflare Tunnels eliminate port forwarding
    • Zero Trust Access for authentication
    • DDoS protection at the edge
    • Automatic SSL/TLS certificates

    Traffic Flow:

    Internet → Cloudflare Edge → Cloudflare Tunnel → 
    NGINX Ingress → Kubernetes Service → Application Pod
    

    7. Storage Architecture


    Persistent Storage:

    • Local PersistentVolumes on each node
    • StorageClass for dynamic provisioning
    • StatefulSets for stateful applications
    • Volume snapshots for backup

    Data Flow:

    • Databases use local SSDs for low latency
    • Media files on larger HDDs
    • Configuration in ConfigMaps and Secrets
    • Logs aggregated to centralized storage

    8. Security Model


    Defense in Depth:


    Layer 1 - Perimeter:

    • Cloudflare WAF blocks malicious traffic
    • Rate limiting prevents abuse
    • Bot protection

    Layer 2 - Network:

    • VLAN segmentation isolates traffic
    • Firewall rules enforce least privilege
    • Network policies in Kubernetes

    Layer 3 - Application:

    • NGINX Ingress with authentication
    • TLS encryption for all traffic
    • Application-level authorization

    Layer 4 - Container:

    • Trivy scanning prevents vulnerable images
    • Non-root containers
    • Read-only root filesystems
    • Resource limits prevent DoS

    9. Monitoring and Observability


    Metrics Collection:

    • Prometheus scrapes metrics from all services
    • Node exporters on each Kubernetes node
    • Application metrics via client libraries
    • Custom metrics from Kubernetes API

    Visualization:

    • Grafana dashboards for real-time insights
    • Uptime Kuma for service availability
    • Alert manager for critical events

    Logging:

    • Container logs aggregated by Kubernetes
    • Planned Loki deployment for log queries
    • Audit logs for security events

    10. Service Interactions


    Typical Request Flow (External Service):


    1. User requests `https://avidlearner.atarnet.org`
    2. DNS resolves to Cloudflare edge server
    3. Cloudflare Tunnel forwards to homelab
    4. NGINX Ingress receives request
    5. Ingress routes to AvidLearner Service
    6. Service load-balances to healthy pod
    7. Pod processes request and returns response
    8. Response flows back through the stack

    9. Internal Service Communication:


      Services communicate directly via Kubernetes DNS:

      • `http://service-name.namespace.svc.cluster.local`
      • No external egress required
      • Low latency within cluster
      • Encrypted with service mesh (future)

      11. Disaster Recovery


      Backup Strategy:

      • Helm charts in Git (Infrastructure as Code)
      • PersistentVolume snapshots
      • Configuration in version control
      • Database backups to external storage

      Recovery Process:

      1. Rebuild Kubernetes cluster from scratch
      2. Apply Helm charts to recreate services
      3. Restore PersistentVolume data
      4. Verify service health and connectivity

      5. Key Design Decisions


        Why Kubernetes?


        • Declarative Infrastructure: GitOps workflow
        • Self-Healing: Automatic pod restarts and rescheduling
        • Scalability: Easy to add nodes and scale services
        • Industry Standard: Skills transfer to professional environments

        Why Kaniko for Builds?


        • Security: No privileged Docker daemon
        • Kubernetes Native: Runs as regular pods
        • Consistency: Same environment every build
        • Caching: Layer caching for faster builds

        Why Cloudflare Tunnels?


        • Security: No open ports on home network
        • Simplicity: No dynamic DNS management
        • Performance: Global edge network
        • Protection: Built-in DDoS mitigation

        Current Stats


        • Services Running: 10+ production services
        • Container Images: All custom-built via CI/CD
        • Average Deployment Time: ~5 minutes from code to production
        • Uptime: 99.5%+ for critical services
        • Resource Usage: ~60% cluster capacity

        Future Enhancements


        Short Term:

        • Implement service mesh (Istio/Linkerd)
        • Deploy Loki for centralized logging
        • Add distributed tracing (Jaeger)
        • Implement automated backups

        Long Term:

        • Multi-cluster federation
        • GitOps with ArgoCD
        • Chaos engineering experiments
        • Cost optimization with spot instances (if moving to cloud)

        Lessons Learned


        1. Start Simple: Don't over-engineer early. Add complexity as needed.
        2. Document Everything: Future you will thank present you.
        3. Automate Relentlessly: Manual processes breed errors.
        4. Security First: Easier to build in than bolt on later.
        5. Monitor Everything: Can't fix what you can't see.

        6. Conclusion


          This homelab architecture provides a production-like environment for learning, experimentation, and hosting real services. Every component serves a purpose, and the entire stack is reproducible via Infrastructure as Code.


          The beauty of this setup is that it mirrors real-world enterprise architectures - the skills and patterns I use here translate directly to professional cloud-native environments.


          Want to see the code? Most of my services and configurations are available on <a href="https://github.com/tinotenda-alfaneti" target="_blank" rel="noopener noreferrer">GitHub</a>.