IaC for my private cloud.
- Infrastructure as Code: Provisioning and configuration with Ansible, OpenTofu, and Packer
- Compute: Proxmox VE cluster (5 nodes, 400GB RAM, 100 vCPUs) running VMs and LXC containers
- Application Containers: Rootless Podman Quadlets (systemd-native), Docker only for where Podman is not feasible
- Networking: VyOS router with zone-based firewall, 3Gbps symmetric WAN, 10Gbps backbone, VLAN segmentation, AdGuard DNS, Tailscale mesh, rotating VPN SOCKS5 proxy
- Public Gateway: Selective internet exposure via Cloudflare DNS/CDN, AWS Lightsail DMZ + CrowdSec WAF, proxied home via Tailscale mesh
- Storage: ZFS + Ceph with NFS/SMB/CephFS, 3-2-1 backups via Restic to Glacier
- Security: Private PKI (step-ca), Keycloak SSO, mTLS, OpenBao secrets, SOPS with multi-key encryption
- DevOps: GitLab (VCS, CI/CD, registries, Terraform state), Dokku/Dokploy/Smallweb PaaS, Coder workspaces
- AI & ML: Private LLM inference (Ollama, LiteLLM, OpenWebUI), Stable Diffusion (ComfyUI, SwarmUI) on RTX Pro 6000 Max-Q
- Observability: Prometheus/Grafana, Loki, Tempo, OTEL Collector, ntopng, NUT
- Communications: Matrix homeserver with bridges, ntfy/Apprise notifications, Postfix relay via SES
- Smart Home: Home Assistant with Z-Wave and Thread/Matter
- Entertainment: Cloud gaming via Bazzite + Sunshine (RTX 1660 Super passthrough), Jellyfin media
| Doc | Description |
|---|---|
| Hosts | Proxmox cluster nodes and host roles |
| Virtual Machines | VM/LXC allocation and resource usage |
| Networking | VLANs, firewall, DNS, reverse proxy |
| Storage | Ceph, ZFS, NFS, CephFS |
| Backups | 3-2-1 strategy, PBS, Restic to S3 |
| UPS | Uninterruptible power supply |
| NixOS | Declarative OS configuration |
| Doc | Description |
|---|---|
| Trust Model | Identity planes and auth architecture |
| Secrets | OpenBao, SOPS, encryption key hierarchy |
| Doc | Description |
|---|---|
| AI/ML | GPU workstation, Ollama, LiteLLM, RAG |
| PaaS | Dokku, Dokploy, Smallweb |
| Monitoring | OTEL, Prometheus, Grafana, Loki, Tempo |
| Communication | Matrix, ntfy, Postfix, bridges |
| Doc | Description |
|---|---|
| AWS | Public gateway, backups, KMS, IAM |
| Cloudflare | DNS and CDN |
| Tailscale | Secure remote access via mesh VPN |
| Doc | Description |
|---|---|
| TODOs | Roadmap and planned work |
- Nix (provides all tools via
nix develop)
All CLI tools are provided by the Nix flake. Enter the dev shell to get:
- Ansible
- AWS CLI
- OpenTofu
- SOPS + Age
- OpenBao CLI
- step-cli
- jq, yq
- pre-commit + gitleaks
# Enter dev shell (or use direnv)
nix develop
# Tools are now available directly
tofu --version
ansible --version
aws --version
sops --versionSecrets are encrypted with SOPS using a three-tier key hierarchy:
- Primary: OpenBao Transit (server-side encryption, key never leaves OpenBao)
- Fallback: AWS KMS (when OpenBao unavailable)
- Emergency: Age key (offline, works when everything else is down)
Single SSO login via Keycloak device auth:
# Login once - opens browser to auth.shdr.ch, gets: SSH cert + OpenBao token + AWS creds
task login
# Check auth status
task login:status
# View/edit secrets (shortcuts: sv, se, sg, sl)
task sv # view secrets/secrets.yml
task se # edit secrets/secrets.yml
task sg -- '.path' # get single value
task sl # list all keys# Separate service logins
step ssh login --provisioner=keycloak # SSH certificate (step-ca)
# OpenBao: login at https://bao.home.shdr.ch, token auto-cached
# Age key (bootstrap or emergency)
# Write Age key to config/age-key.txt, use SOPS, then remove
task sv
rm config/age-key.txtRe-encrypt all secrets with current keys from .sops.yaml:
task login
task sops:rotateThe Age key is the master key that can decrypt everything:
Age Key → decrypts → Recovery Keys → unseals → OpenBao → unlocks → Everything
The Age key is not stored on disk normally. For bootstrap or emergencies:
- Write key to
config/age-key.txt - Perform recovery operations
- Remove
config/age-key.txtwhen done
Keep the Age key backed up offline (printed, USB in safe, etc.)
This repo uses pre-commit hooks to prevent accidental secret leaks:
# Install hooks (first time only)
pre-commit install
# Run manually
pre-commit run --all-files- Full admin access to AWS Account
- Access to Home Network: 2 network interfaces required to connect to both the Bell Gigahub and VyOS virtual router
- Access to Age Private Key
- Bell PPPoE credentials
These steps set up the base infrastructure necessary for provisioning the cloud. The goal is to:
- Deploy the OpenTofu backend stack (S3 bucket, KMS key, DynamoDB table for state)
- Create SOPS KMS key for fallback encryption
- Write the OpenTofu state config to
config/tofu-state.config
-
Login (opens browser, gets AWS + OpenBao + SSH creds)
task login
-
Run the bootstrap task
task bootstrap
-
Verify SOPS KMS key was created
aws kms describe-key --key-id alias/aether-sops
-
Manually apply rack switch configuration (README)
-
Provision router (README)
task provision:home:router
-
Manually apply office switch configuration (README)
-
Provision NFS (README)
task provision:home:nfs
Provision Certificate Authority (README)
task provision:home:step-caProvision OpenBao (README)
task provision:home:openbaoAfter first-time init, save recovery keys to secrets/openbao-recovery-keys.yml and encrypt:
sops -e -i secrets/openbao-recovery-keys.ymlSOPS Transit + OIDC auth is configured during task tofu:apply (bootstrap with root token, then revoke):
# First time only: use root token to bootstrap OIDC
export VAULT_ADDR=https://bao.home.shdr.ch
export VAULT_TOKEN=<root-token>
task tofu:apply
bao token revoke -self # revoke root token after bootstrapProvision Keycloak (README)
task provision:home:keycloaktask tofu:plantask tofu:applytask configure