Skip to content

melchiorhering/open-data-platform

Repository files navigation

Open Data Platform

A modern, declarative, and highly scalable data platform built on top of Kubernetes and Talos Linux. This repository contains the complete Infrastructure-as-Code (IaC) required to bootstrap the platform from scratch, whether running locally in Docker for development or on bare-metal cloud providers like Hetzner.

🏗️ Architecture Philosophy

This platform is built with a few strict architectural principles:

  • GitOps & IaC First: Everything from the operating system up to the data pipelines is defined in code using Pulumi (Python) and Kubernetes manifests.
  • Command-Driven Lifecycle: We use the Pulumi Command provider to orchestrate low-level CLI tools (talosctl, docker) as native Pulumi resources.
  • Immutable OS: We use Talos Linux to eliminate SSH access and configuration drift.
  • Modern Networking: eBPF-based networking (Cilium) and standard Gateway APIs replace legacy Kube-Proxy and Ingress controllers.

🧩 Platform Services

1. Compute & OS Layer

  • Talos Linux: A secure, immutable OS managed entirely via API.
  • Local Compute (Docker): Our local.py engine uses pulumi_command to spin up multi-node Talos clusters. It features an "Early Exit" strategy that bypasses the CNI-wait deadlock, allowing Cilium to be installed immediately.

2. Networking Services

  • Cilium: Used in strict eBPF mode for maximum performance and observability (Hubble).
  • Kubernetes Gateway API: The unified entry point for all platform traffic.
  • Native Bridging: We bypass Docker Desktop networking limitations using native kubectl port-forward directly to the Cilium Gateway.

🚀 Quick Start (Local Development)

1. Initialize the Environment

uv sync

2. Local Networking & Dynamic DNS

Docker on macOS runs inside a Virtual Machine, preventing your host from routing traffic to the internal 10.5.x.x subnets. We solve this using two lightweight local tools:

The Native Bridge (just bridge) Instead of background Docker containers, we use a native Kubernetes tunnel. Running just bridge maps ports 80 and 443 on your Mac directly to the Cilium Global Gateway inside the cluster.

Dynamic DNS (just sync-dns) Your browser needs to know that s3.k8.local lives at 127.0.0.1. The sync-dns script queries the cluster for all active HTTPRoutes and automatically manages a block in your /etc/hosts file.

3. Deploy the Platform

# Step A: Build the cluster and services
just up

# Step B: Open the bridge (Run this in a NEW terminal tab)
just bridge

🔍 Platform Verification Checklist

1. Gateway Status

Ensure Cilium has recognized the Gateway API resources.

# Check if the GatewayClass 'cilium' exists
just k get gatewayclass

# Check if the Gateway is programmed (Should be True)
just k get gateway -A

If status is 'Unknown', run just fix-gateway to restart the Cilium Operator.

2. Security & Certificates

Confirm cert-manager has issued the wildcard TLS certificate.

just k get cert -n network

3. Connectivity Test

Test the S3 console route via the bridge.

curl -kI [https://s3-console.k8.local](https://s3-console.k8.local)

🛠️ Troubleshooting

  • "Unknown Flag" Errors: Ensure your talosctl version is at least v1.12+.
  • Pending Operations: If a deployment is interrupted, run uv run pulumi refresh to clear the stack state before running just up.
  • Gateway Stuck in Unknown: Run just fix-gateway. This triggers a rollout restart of the Cilium Operator to force it to pick up the Gateway CRDs.

🧹 Teardown

To destroy the cluster and clean your /etc/hosts file:

just nuke

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors