A modern, declarative, and highly scalable data platform built on top of Kubernetes and Talos Linux. This repository contains the complete Infrastructure-as-Code (IaC) required to bootstrap the platform from scratch, whether running locally in Docker for development or on bare-metal cloud providers like Hetzner.
This platform is built with a few strict architectural principles:
- GitOps & IaC First: Everything from the operating system up to the data pipelines is defined in code using Pulumi (Python) and Kubernetes manifests.
- Command-Driven Lifecycle: We use the Pulumi Command provider to orchestrate low-level CLI tools (
talosctl,docker) as native Pulumi resources. - Immutable OS: We use Talos Linux to eliminate SSH access and configuration drift.
- Modern Networking: eBPF-based networking (Cilium) and standard Gateway APIs replace legacy Kube-Proxy and Ingress controllers.
- Talos Linux: A secure, immutable OS managed entirely via API.
- Local Compute (Docker): Our
local.pyengine usespulumi_commandto spin up multi-node Talos clusters. It features an "Early Exit" strategy that bypasses the CNI-wait deadlock, allowing Cilium to be installed immediately.
- Cilium: Used in strict eBPF mode for maximum performance and observability (Hubble).
- Kubernetes Gateway API: The unified entry point for all platform traffic.
- Native Bridging: We bypass Docker Desktop networking limitations using native
kubectl port-forwarddirectly to the Cilium Gateway.
uv syncDocker on macOS runs inside a Virtual Machine, preventing your host from routing traffic to the internal 10.5.x.x subnets. We solve this using two lightweight local tools:
The Native Bridge (just bridge)
Instead of background Docker containers, we use a native Kubernetes tunnel. Running just bridge maps ports 80 and 443 on your Mac directly to the Cilium Global Gateway inside the cluster.
Dynamic DNS (just sync-dns)
Your browser needs to know that s3.k8.local lives at 127.0.0.1. The sync-dns script queries the cluster for all active HTTPRoutes and automatically manages a block in your /etc/hosts file.
# Step A: Build the cluster and services
just up
# Step B: Open the bridge (Run this in a NEW terminal tab)
just bridgeEnsure Cilium has recognized the Gateway API resources.
# Check if the GatewayClass 'cilium' exists
just k get gatewayclass
# Check if the Gateway is programmed (Should be True)
just k get gateway -AIf status is 'Unknown', run just fix-gateway to restart the Cilium Operator.
Confirm cert-manager has issued the wildcard TLS certificate.
just k get cert -n networkTest the S3 console route via the bridge.
curl -kI [https://s3-console.k8.local](https://s3-console.k8.local)- "Unknown Flag" Errors: Ensure your
talosctlversion is at least v1.12+. - Pending Operations: If a deployment is interrupted, run
uv run pulumi refreshto clear the stack state before runningjust up. - Gateway Stuck in Unknown: Run
just fix-gateway. This triggers a rollout restart of the Cilium Operator to force it to pick up the Gateway CRDs.
To destroy the cluster and clean your /etc/hosts file:
just nuke