Deep Dive: My Homelab Architecture and Service Interactions

Overview

My homelab is an infrastructure running entirely on Kubernetes, with carefully designed layers for networking, security, monitoring, and service delivery. Here's how everything fits together.

Architecture Layers

1. Infrastructure Foundation

Bare Metal Cluster:

3-node Kubernetes cluster (1 control plane, 2 workers)
Local storage with dynamic PersistentVolume provisioning
Flannel CNI for pod networking
MetalLB for LoadBalancer service types

Network Architecture:

Segregated VLANs for management, services, and user traffic
Firewall rules controlling inter-VLAN communication
Internal DNS resolution for service discovery
External DNS via Cloudflare

2. Kubernetes Control Plane

The brain of the operation - managing workload orchestration, scheduling, and service discovery across all nodes.

Key Components:

etcd for distributed configuration
kube-apiserver as the central management interface
kube-scheduler for intelligent pod placement
kube-controller-manager for cluster state reconciliation

3. Service Mesh and Networking

Ingress Layer:

NGINX Ingress Controller as the gateway
TLS termination with Let's Encrypt certificates
Path-based routing to backend services
Rate limiting and request filtering

Service Discovery:

CoreDNS for internal service resolution
Kubernetes Services with ClusterIP/LoadBalancer
Headless services for StatefulSets

4. Application Services

Development & Learning:

AvidLearner: Go learning platform with React frontend
LabMan CLI: Remote homelab session management via SSH
Rebalancer Operator: Custom Kubernetes operator for pod optimization

Media & Content:

Navidrome: Self-hosted music streaming (Subsonic API)
Ebook Reader: Digital library with progress tracking
Poetry Blog: Content publishing platform

DevOps & Automation:

Jenkins: CI/CD orchestration with Kaniko builds
Trivy: Container image security scanning
Uptime Kuma: Service health monitoring

Observability Stack:

Prometheus: Metrics collection and alerting
Grafana: Visualization and dashboards
Loki: Log aggregation (planned)

5. CI/CD Pipeline Flow

The deployment pipeline is fully automated:

Code Push: Developer pushes to Git repository
Jenkins Trigger: Webhook triggers Jenkins pipeline
Build Phase: Kaniko builds container image inside Kubernetes (no Docker daemon needed)
Security Scan: Trivy scans image for CVEs and misconfigurations
Registry Push: Clean images pushed to Docker Hub
Helm Deploy: Helm charts deploy updated services to Kubernetes
Ingress Update: NGINX Ingress routes traffic to new pods
Health Check: Kubernetes readiness probes verify deployment

6. External Access Architecture

Cloudflare Integration:

Cloudflare Tunnels eliminate port forwarding
Zero Trust Access for authentication
DDoS protection at the edge
Automatic SSL/TLS certificates

Traffic Flow:

Internet → Cloudflare Edge → Cloudflare Tunnel → 
NGINX Ingress → Kubernetes Service → Application Pod

7. Storage Architecture

Persistent Storage:

Local PersistentVolumes on each node
StorageClass for dynamic provisioning
StatefulSets for stateful applications
Volume snapshots for backup

Data Flow:

Databases use local SSDs for low latency
Media files on larger HDDs
Configuration in ConfigMaps and Secrets
Logs aggregated to centralized storage

8. Security Model

Defense in Depth:

Layer 1 - Perimeter:

Cloudflare WAF blocks malicious traffic
Rate limiting prevents abuse
Bot protection

Layer 2 - Network:

VLAN segmentation isolates traffic
Firewall rules enforce least privilege
Network policies in Kubernetes

Layer 3 - Application:

NGINX Ingress with authentication
TLS encryption for all traffic
Application-level authorization

Layer 4 - Container:

Trivy scanning prevents vulnerable images
Non-root containers
Read-only root filesystems
Resource limits prevent DoS

9. Monitoring and Observability

Metrics Collection:

Prometheus scrapes metrics from all services
Node exporters on each Kubernetes node
Application metrics via client libraries
Custom metrics from Kubernetes API

Visualization:

Grafana dashboards for real-time insights
Uptime Kuma for service availability
Alert manager for critical events

Logging:

Container logs aggregated by Kubernetes
Planned Loki deployment for log queries
Audit logs for security events

10. Service Interactions

Typical Request Flow (External Service):

User requests `https://avidlearner.atarnet.org`
DNS resolves to Cloudflare edge server
Cloudflare Tunnel forwards to homelab
NGINX Ingress receives request
Ingress routes to AvidLearner Service
Service load-balances to healthy pod
Pod processes request and returns response
Response flows back through the stack

Internal Service Communication:

Services communicate directly via Kubernetes DNS:

`http://service-name.namespace.svc.cluster.local`
No external egress required
Low latency within cluster
Encrypted with service mesh (future)

11. Disaster Recovery

Backup Strategy:

Helm charts in Git (Infrastructure as Code)
PersistentVolume snapshots
Configuration in version control
Database backups to external storage

Recovery Process:

Rebuild Kubernetes cluster from scratch
Apply Helm charts to recreate services
Restore PersistentVolume data
Verify service health and connectivity

Key Design Decisions

Why Kubernetes?

Declarative Infrastructure: GitOps workflow
Self-Healing: Automatic pod restarts and rescheduling
Scalability: Easy to add nodes and scale services
Industry Standard: Skills transfer to professional environments

Why Kaniko for Builds?

Security: No privileged Docker daemon
Kubernetes Native: Runs as regular pods
Consistency: Same environment every build
Caching: Layer caching for faster builds

Why Cloudflare Tunnels?

Security: No open ports on home network
Simplicity: No dynamic DNS management
Performance: Global edge network
Protection: Built-in DDoS mitigation

Current Stats

Services Running: 10+ production services
Container Images: All custom-built via CI/CD
Average Deployment Time: ~5 minutes from code to production
Uptime: 99.5%+ for critical services
Resource Usage: ~60% cluster capacity

Future Enhancements

Short Term:

Implement service mesh (Istio/Linkerd)
Deploy Loki for centralized logging
Add distributed tracing (Jaeger)
Implement automated backups

Long Term:

Multi-cluster federation
GitOps with ArgoCD
Chaos engineering experiments
Cost optimization with spot instances (if moving to cloud)

Lessons Learned

Start Simple: Don't over-engineer early. Add complexity as needed.
Document Everything: Future you will thank present you.
Automate Relentlessly: Manual processes breed errors.
Security First: Easier to build in than bolt on later.
Monitor Everything: Can't fix what you can't see.

Conclusion

This homelab architecture provides a production-like environment for learning, experimentation, and hosting real services. Every component serves a purpose, and the entire stack is reproducible via Infrastructure as Code.

The beauty of this setup is that it mirrors real-world enterprise architectures - the skills and patterns I use here translate directly to professional cloud-native environments.

Want to see the code? Most of my services and configurations are available on <a href="https://github.com/tinotenda-alfaneti" target="_blank" rel="noopener noreferrer">GitHub</a>.