Skip to content

tfd-ed/homelab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Homelab Journey

Welcome to my homelab infrastructure repository! This is the central hub for documenting my homelab setup, configurations, and learnings.

Note that this project is not intended for heavy production workloads. In Cambodia, stable electricity and high-speed internet are not guaranteed, so this homelab is designed for learning, experimentation, and personal projects rather than critical applications. So, if you plan to use this as a reference for your own homelab, please consider the limitations of your environment and adjust accordingly.

YouTube Video

I intended to release YouTube videos docummenting this journey.

Architecture Overview

Infrastructure Architecture Diagram

πŸ’» Hardware Specifications

Physical Server

Model: GMKTec NucBox M5 Ultra

CPU: AMD Ryzen 7 7730U

  • 8 Cores / 16 Threads
  • Base Clock: 2.0 GHz
  • Boost Clock: up to 4.5 GHz
  • L1 Cache: 512 KB (64KB Γ— 8)
  • L2 Cache: 4 MB (512KB Γ— 8)
  • L3 Cache: 16 MB (shared)
  • TDP: 15-35W
  • Architecture: Zen 3+ (6nm)

Memory: 64 GB DDR4 RAM

  • Type: DDR4 SO-DIMM
  • Speed: 3200 MHz
  • Configuration: Dual Channel
  • Maximum Capacity: 64 GB

Storage: 1 TB NVMe SSD

  • Type: M.2 2280 NVMe PCIe Gen 3.0
  • Read Speed: ~3500 MB/s
  • Write Speed: ~3000 MB/s
  • TBW: High endurance model

Graphics: AMD Radeon Graphics

  • Integrated GPU (Ryzen 7730U)
  • Cores: 8 CUs (Compute Units)
  • Architecture: RDNA 2
  • Max Frequency: 2.0 GHz
  • Display Support: Dual 4K@60Hz or Single 8K@30Hz

Operating System:

Why This Hardware?

βœ… Power Efficiency - 15-35W TDP, ideal for 24/7 operation
βœ… Performance - 8c/16t handles multiple VMs simultaneously
βœ… Memory - 64GB sufficient for 8+ VMs with headroom
βœ… Storage - Fast NVMe for quick VM boot and low latency
βœ… Dual NIC - Network segregation and redundancy
βœ… Compact - Minimal footprint, quiet operation
βœ… Cost Effective - Fraction of enterprise server costs

πŸ—οΈ Infrastructure Overview

This homelab runs on Proxmox VE and uses Infrastructure as Code (Terraform + Ansible) for reproducible deployments.

Virtual Machines

VM IP CPU RAM Disk Purpose
k8s-master 192.168.100.201 2 4 GB 50 GB Kubernetes control plane (K3s)
k8s-worker-1 192.168.100.202 4 14 GB 150 GB Kubernetes worker node
k8s-worker-2 192.168.100.203 4 14 GB 150 GB Kubernetes worker node
database-vm 192.168.100.205 4 6 GB 200 GB Centralized database server
app-gateway 192.168.100.210 2 2 GB 20 GB Nginx reverse proxy
monitoring 192.168.100.220 2 6 GB 80 GB Prometheus, Grafana, Loki
n8n 192.168.100.230 2 6 GB 50 GB n8n workflow automation
ci-cd 192.168.100.240 2 5 GB 100 GB GitHub Actions + Docker registry

Total Resources: 21 CPU cores, 57 GB RAM, 820 GB storage
Available for Host: 3 cores (37.5%), 7 GB RAM (10.9%), ~180 GB storage

Resource Allocation Notes:

  • All VMs have adequate resources for their intended workloads
  • CI/CD VM may require additional RAM (8GB+) for heavy Docker builds
  • 7GB RAM reserved for Proxmox host ensures system stability
  • Storage allocation allows for data growth and logs
  • Balanced allocation prevents resource contention

πŸ“‚ Repository Structure

homelab-journey/
β”œβ”€β”€ terraform/           # Infrastructure provisioning
β”‚   β”œβ”€β”€ vms.tf          # VM definitions
β”‚   β”œβ”€β”€ variables.tf    # Configurable variables
β”‚   └── terraform.tfvars # Your configuration
β”œβ”€β”€ ansible/            # Configuration management (organized)
β”‚   β”œβ”€β”€ inventory.ini   # VM inventory
β”‚   β”œβ”€β”€ ansible.cfg     # Ansible configuration
β”‚   β”œβ”€β”€ README.md       # Ansible documentation
β”‚   └── playbooks/      # πŸ†• Organized playbooks by category
β”‚       β”œβ”€β”€ infrastructure/    # Core infrastructure setup
β”‚       β”‚   β”œβ”€β”€ proxmox-setup.yml
β”‚       β”‚   β”œβ”€β”€ qemu-agent-setup.yml
β”‚       β”‚   β”œβ”€β”€ docker-setup.yml
β”‚       β”‚   └── nginx-gateway-setup.yml
β”‚       β”œβ”€β”€ kubernetes/        # K8s cluster deployment
β”‚       β”‚   └── k3s-cluster-setup.yml
β”‚       β”œβ”€β”€ services/          # Application services
β”‚       β”‚   β”œβ”€β”€ database-setup.yml
β”‚       β”‚   β”œβ”€β”€ monitoring-setup.yml
β”‚       β”‚   β”œβ”€β”€ monitoring-dashboards-setup.yml
β”‚       β”‚   β”œβ”€β”€ n8n-setup.yml
β”‚       β”‚   └── cicd-setup.yml
β”‚       └── networking/        # Network & remote access
β”‚           β”œβ”€β”€ cloudflare-tunnel-setup.yml
β”‚           └── github-runner-setup.yml
β”œβ”€β”€ script/             # Helper scripts
β”‚   β”œβ”€β”€ setup-k8s-complete.sh
β”‚   β”œβ”€β”€ setup-cloudflare-tunnel.sh
β”‚   β”œβ”€β”€ run-monitoring-setup.sh
β”‚   β”œβ”€β”€ run-monitoring-dashboards-setup.sh
β”‚   β”œβ”€β”€ run-n8n-setup.sh
β”‚   β”œβ”€β”€ run-database-setup.sh
β”‚   β”œβ”€β”€ generate-ssh-keys.sh
β”‚   └── cleanup-known-hosts.sh
β”œβ”€β”€ diagram/            # Infrastructure diagrams

Bootstrapping The Infrastructure

Prerequisites

Before starting, ensure you have:

  • βœ… Proxmox VE installed and configured
  • βœ… SSH keys generated (run ./generate-ssh-keys.sh you'll find the keys in ssh-keys/ directory after running the script. This will be used for Ansible access to VMs)
  • βœ… Ansible installed on your control machine
  • βœ… GitHub account (for CI/CD setup)
  • βœ… Cloudflare account (for tunnel setup)
  • βœ… Domain name configured in Cloudflare DNS

1. Configure Proxmox Host with Basic Packages with Ansible

Replace YOUR_SSH_PUBLIC_KEY_HERE in proxmox-setup.yml with your actual SSH public key to enable passwordless SSH access to the VMs.

cd ansible
ansible-playbook playbooks/infrastructure/proxmox-setup.yml

This playbook will setup basic packages such as fastfetch, htop, lm-sensors, qemu-guest-agent, create cloud-init configuration for terraform to use that add ssh user with your public key.

2. Provision VMs with Terraform

For now, CPU Type for proxmox_virtual_environment_vm is set to host because We experience MongoDB v8.4 not working (AVX2 not detected) when using X86-64-AES CPU type. We will investigate this further and update the configuration if needed. This can be the issue if we migrate another host with different CPU architecture in the future.

Make sure to create terraform.tfvars with your specific configuration (IP addresses, credentials, etc.) before running Terraform. See Terraform Configuration for details.

cd terraform
terraform init
terraform plan
terraform apply -auto-approve

It might take awhile for qemu-agent to be installed and for the VMs to be fully provisioned and gives signal back to Terraform. You can check the Proxmox Web UI to monitor the VM creation process.

3. SSH Access to VMs

ssh -i path/to/ssh-keys ubuntu@192.168.100.201 # Master node
ssh -i path/to/ssh-keys ubuntu@192.168.100.202 # Worker 1
ssh -i path/to/ssh-keys ubuntu@192.168.100.203 # Worker 2

Configuring Infrastructure Components with Ansible

This section provides detailed step-by-step instructions for manually setting up the entire infrastructure.

Before setting up each component, make sure you create an ansible inventory.ini from the provided inventory.ini.example and update the IP addresses and credentials as needed. There is /path/to/your/key which is the path to your private SSH key that matches the public key you added to proxmox-setup.yml for Ansible access.

1. Setup CI/CD VM

This will install GitHub Actions runner and Docker registry on the CI/CD VM.

cd ansible
ansible-playbook playbooks/services/cicd-setup.yml

Note that you will need to manually configure the GitHub runner token after installation. See the playbook documentation for instructions.

2. Setup Kubernetes Cluster

Run the K3s cluster setup playbook to configure the master and worker nodes:

cd ansible
ansible-playbook playbooks/kubernetes/k3s-cluster-setup.yml

This playbook will:

  • Install K3s on the master node
  • Configure worker nodes to join the cluster
  • Setup kubectl configuration
  • Verify cluster health

Verify:

ssh -i path/to/ssh-keys ubuntu@192.168.100.201 # SSH into master node
kubectl get nodes
# Should show: k8s-master, k8s-worker-1, k8s-worker-2 all Ready

Each k8s worker node will be configured to refer to the registry on the CI/CD VM for pulling images via registry.homelab.local hostname.

Verify registry access:

# From master node
curl -v http://registry.homelab.local/v2/_catalog
# Should return JSON with list of repositories (even if empty)

3. Setup Gateway VM

This will install Nginx Proxy Manager on the gateway VM and configure it as a reverse proxy for your Kubernetes services.

# Run interactive setup script
cd ansible
ansible-playbook playbooks/networking/nginx-gateway-setup.yml

The script will:

  1. Install Nginx on the gateway VM
  2. Setup health checks and monitoring

Access the admin UI at http://192.168.100.210:81. You will be asked to create an admin account on first login. Use this UI to configure reverse proxy rules for your Kubernetes services.

5. Setup Database Server VM

This will install PostgreSQL, MySQL, and MongoDB on the database server VM. Create .env from the provided .env.example in the ansible directory and set secure passwords for each database. Then run the prepared script to setup the databases.

chmod ./script/run-database-setup.sh
./script/run-database-setup.sh

Note that this will load passwords from the .env file. If you run ansible playbook directly, make sure to set the environment variables or edit the playbook to include the passwords.

6. Setup n8n Workflow Automation

This will install n8n on the n8n VM and configure it to use the PostgreSQL database on the database server VM.

cd ansible
ansible-playbook playbooks/services/n8n-setup.yml

This playbook will:

  • Install GitHub Actions self-hosted runner
  • Install Docker and setup local registry
  • Install kubectl and helm for Kubernetes deployments

7. Setup Monitoring Stack

This will install Prometheus, Grafana, and Loki on the monitoring VM and configure them to monitor your Kubernetes cluster and other services.

cd ansible
ansible-playbook playbooks/services/monitoring-setup.yml
ansible-playbook playbooks/services/monitoring-dashboards-setup.yml

Operations & Quick Reference

Daily Operations

Cluster Status

# Check node health
kubectl get nodes
kubectl describe node k8s-worker-1

# View all resources
kubectl get all --all-namespaces

# Check cluster component status
kubectl get componentstatuses

# View cluster events
kubectl get events --all-namespaces --sort-by='.lastTimestamp'

Application Management

# List all pods
kubectl get pods -n your-namespace

# View deployment status
kubectl get deployments -n your-namespace

# Check services
kubectl get svc -n your-namespace

# View ingress routes
kubectl get ingress --all-namespaces

# Port forward for local access
kubectl port-forward svc/myapp 8080:80 -n your-namespace

Logs and Debugging

# View pod logs
kubectl logs pod-name -n namespace

# Follow logs in real-time
kubectl logs -f deployment/myapp -n namespace

# View logs from previous container instance
kubectl logs pod-name --previous -n namespace

# Get last 100 lines
kubectl logs --tail=100 pod-name -n namespace

# View logs from specific container in multi-container pod
kubectl logs pod-name -c container-name -n namespace

Troubleshooting

Database Connection Issues

Issue: Applications cannot connect to databases on database-vm

Diagnostic Steps:

# Test database connectivity from K8s pod
kubectl run -it --rm psql-test --image=postgres:16 --restart=Never -- \
  psql -h 192.168.100.205 -U postgres -d postgres

# Test from local machine
psql -h 192.168.100.205 -U postgres -d postgres
mysql -h 192.168.100.205 -u root -p
mongosh mongodb://192.168.100.205:27017

# Check database services are running
ssh user@192.168.100.205
sudo systemctl status postgresql
sudo systemctl status mysql
sudo systemctl status mongod

# Check firewall rules
sudo ufw status
# Should show ports 5432, 3306, 27017 allowed

# Verify databases are listening on all interfaces
sudo netstat -tlnp | grep -E '5432|3306|27017'
# Should show 0.0.0.0:PORT, not 127.0.0.1:PORT

Solutions:

  1. PostgreSQL Not Allowing Remote Connections:

    # Edit pg_hba.conf
    sudo nano /etc/postgresql/16/main/pg_hba.conf
    # Add: host all all 192.168.100.0/24 scram-sha-256
    
    # Edit postgresql.conf
    sudo nano /etc/postgresql/16/main/postgresql.conf
    # Set: listen_addresses = '*'
    
    # Restart PostgreSQL
    sudo systemctl restart postgresql
  2. MySQL Binding to Localhost Only:

    # Edit MySQL config
    sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf
    # Set: bind-address = 0.0.0.0
    
    # Restart MySQL
    sudo systemctl restart mysql
    
    # Grant remote access
    mysql -u root -p
    GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'password';
    FLUSH PRIVILEGES;
  3. MongoDB Not Allowing Remote Connections:

    # Edit mongod.conf
    sudo nano /etc/mongod.conf
    # Set: bindIp: 0.0.0.0
    
    # Restart MongoDB
    sudo systemctl restart mongod

High Resource Usage

Issue: Node or pod consuming excessive CPU/memory

Diagnostic Steps:

# Check node resource usage
kubectl top nodes

# Check pod resource usage
kubectl top pods --all-namespaces --sort-by=memory
kubectl top pods --all-namespaces --sort-by=cpu

# View detailed node metrics
kubectl describe node node-name

# Check for resource limits
kubectl describe deployment deployment-name -n namespace | grep -A 5 Limits

Solutions:

  1. Adjust Resource Limits:

    resources:
      requests:
        memory: "256Mi"
        cpu: "200m"
      limits:
        memory: "512Mi"
        cpu: "500m"
  2. Scale Down Non-Critical Pods:

    kubectl scale deployment/myapp --replicas=1 -n namespace
  3. Restart Problematic Pods:

    kubectl rollout restart deployment/myapp -n namespace
  4. Check for Memory Leaks:

    # Monitor pod over time
    watch kubectl top pod pod-name -n namespace
    
    # View Grafana dashboards for trends
    # http://192.168.100.220:3000

Network Connectivity Issues

Issue: Pods cannot communicate with each other or external services

Diagnostic Steps:

# Test pod-to-pod communication
kubectl exec -it pod1 -n namespace -- ping pod2-ip

# Test external connectivity
kubectl exec -it pod-name -n namespace -- ping 8.8.8.8
kubectl exec -it pod-name -n namespace -- curl https://google.com

# Check CNI plugin (Flannel in K3s)
kubectl get pods -n kube-system -l app=flannel

# Verify network policies (if any)
kubectl get networkpolicies --all-namespaces

# Check iptables rules (on nodes)
ssh user@node-ip
sudo iptables -L -n -v

Solutions:

  • Restart affected pods
  • Restart CNI plugin pods
  • Check K3s service on nodes: sudo systemctl status k3s or sudo systemctl status k3s-agent
  • Verify routing: ip route show

General Debugging Tips

  1. Always check events first:

    kubectl get events -n namespace --sort-by='.lastTimestamp'
  2. Use describe for detailed info:

    kubectl describe <resource-type> <resource-name> -n namespace
  3. Check logs systematically:

    • Application logs: kubectl logs
    • System logs: journalctl
    • Service logs: systemctl status
  4. Isolate the issue:

    • Test from multiple points (local, same node, different node)
    • Simplify (reduce to minimal reproduction case)
    • Compare with working examples
  5. Use debug containers:

    # Run temporary debugging pod
    kubectl run debug -it --rm --image=nicolaka/netshoot --restart=Never -- bash
  6. Document and version:

    • Keep notes of issues and solutions
    • Track configuration changes in Git
    • Use Git tags for stable states

Ansible Playbook Categories

All playbooks are organized in ansible/playbooks/ by functional category:

  • πŸ—οΈ Infrastructure - Proxmox, Docker, Nginx, QEMU agent (4 playbooks)
  • ☸️ Kubernetes - K3s cluster deployment (1 playbook)
  • πŸš€ Services - Databases, monitoring, n8n, CI/CD (5 playbooks)
  • 🌐 Networking - Cloudflare tunnel, GitHub runner (2 playbooks)

πŸ“ Notes

This repository follows Infrastructure as Code principles:

  • All infrastructure defined in code (Terraform + Ansible + Kubernetes)
  • Version controlled and reproducible
  • Automated CI/CD with GitHub Actions
  • Secure remote access via Cloudflare Tunnel
  • Documented for learning and future reference
  • Designed for homelab experimentation and learning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors