-
Notifications
You must be signed in to change notification settings - Fork 0
System Overview
This page provides detailed technical documentation of the k8s-ephemeral-environments platform architecture, including infrastructure components, namespace organization, and the PR environment lifecycle.
The platform runs on a single VPS with k3s, hosting both permanent infrastructure components and ephemeral PR environments.
+---------------------------------------------------------------------+
| VPS (4 vCPU, 24GB RAM, 100GB NVMe) |
| +---------------------------------------------------------------+ |
| | k3s Cluster | |
| | | |
| | +-----------------+ +-----------------+ +-----------------+ | |
| | | observability | | arc-runners | | app-pr-123 | | |
| | | | | | | (ephemeral) | | |
| | | - Prometheus | | - Runner x2 | | | | |
| | | - Loki | | | | - App Pod | | |
| | | - Grafana | | | | - DB Pod | | |
| | +-----------------+ +-----------------+ +-----------------+ | |
| | | |
| | +-----------------+ +-----------------+ +-----------------+ | |
| | | app-pr-456 | | app-pr-789 | | platform | | |
| | | (ephemeral) | | (ephemeral) | | (system) | | |
| | +-----------------+ +-----------------+ +-----------------+ | |
| +---------------------------------------------------------------+ |
+---------------------------------------------------------------------+
| Attribute | Value |
|---|---|
| Provider | Oracle Cloud Infrastructure (OCI) |
| Public IP | 168.138.151.63 |
| Hostname | genilda |
| OS | Ubuntu 24.04.3 LTS (Noble Numbat) |
| Architecture | ARM64 (aarch64) |
| vCPUs | 4 |
| RAM | 24 GB |
| Disk | 96 GB NVMe |
Important: All container images must support
linux/arm64architecture.
The cluster organizes workloads into permanent system namespaces and ephemeral PR namespaces.
| Namespace | Purpose | Lifecycle |
|---|---|---|
kube-system |
k3s core components, Traefik ingress | Permanent |
observability |
Prometheus, Loki, Grafana | Permanent |
arc-systems |
ARC controller (manages runner lifecycle) | Permanent |
arc-runners |
GitHub Actions self-hosted runner pods | Permanent |
platform |
Shared base components, CronJobs | Permanent |
{project-id}-pr-{number} |
Ephemeral environment per PR | Ephemeral (PR lifecycle) |
Ephemeral namespaces follow the pattern: {project-id}-pr-{number}
Examples:
-
k8s-ee-pr-28- PR #28 in the k8s-ee project -
my-app-pr-156- PR #156 in the my-app project
| Component | Technology | Justification |
|---|---|---|
| Kubernetes | k3s | Lightweight, production-ready, ideal for single-node |
| Ingress | Traefik | Included in k3s, native Let's Encrypt support |
| CI/CD | GitHub Actions | Native integration, familiar to developers |
| Logs | Loki + Promtail | Lightweight, native Grafana integration |
| Metrics | Prometheus | Industry standard, broad ecosystem |
| Dashboards | Grafana | Unified interface for logs and metrics |
| Runners | actions-runner-controller (ARC) | Ephemeral and scalable runners in cluster |
| PostgreSQL | CloudNativePG | Manages PostgreSQL lifecycle automatically |
| MariaDB | mariadb:11 | Simple deployment for MySQL-compatible needs |
| MongoDB | MongoDB Community Operator | Replica set management for NoSQL needs |
| Redis | redis:7-alpine | High-performance caching |
| Object Storage | MinIO | S3-compatible file storage |
| Secrets | Sealed Secrets | Encrypted secrets in git |
| Storage | Local Path Provisioner | Simple, adequate for MVP |
| DNS | Wildcard |
*.k8s-ee.genesluna.dev resolves to VPS IP |
| Network Isolation | NetworkPolicies (kube-router) | Isolation between PR namespaces |
| Priority Classes |
platform-critical, default-app
|
Workload prioritization |
The complete flow from PR creation to environment destruction:
+------------+ +------------+ +------------+ +------------+
| PR Open |---->| GitHub |---->| Create |---->| Deploy |
| | | Action | | Namespace | | App + DB |
+------------+ +------------+ +------------+ +------------+
|
v
+------------+ +------------+ +------------+ +------------+
| PR Close |---->| GitHub |---->| Delete |<----| Preview |
| or Merge | | Action | | Namespace | | URL |
+------------+ +------------+ +------------+ +------------+
- PR Opened - Developer opens a pull request
- Organization Validated - PR author's org checked against allowlist
-
Namespace Created -
{project-id}-pr-{number}namespace provisioned - Resource Quotas Applied - Dynamic quotas based on enabled databases
- NetworkPolicies Applied - Isolation rules for the namespace
- Application Deployed - App + configured databases deployed via Helm
- Ingress Created - Public URL becomes available
- Bot Comments - PR receives comment with preview URL
- Push to PR - New commits trigger automatic re-deployment
-
Optional Preserve -
/preservecommand extends environment life - PR Closed/Merged - Namespace destroyed (unless preserved)
- Preserve Expiry - Hourly CronJob removes expired preserve labels
- Orphan Cleanup - 6-hour CronJob catches any missed namespaces
Developers can keep environments alive after PR close using the /preserve command:
| Constraint | Value |
|---|---|
| Default duration | 48 hours |
| Maximum duration | 48 hours |
| Max preserved per user | 3 environments |
| Expiry check | Hourly CronJob |
ResourceQuota is automatically calculated based on enabled databases. No manual configuration required.
Base (app only): 300m CPU, 512Mi memory, 1Gi storage
| Database | CPU | Memory | Storage |
|---|---|---|---|
| PostgreSQL | +500m | +512Mi | +2Gi |
| MongoDB | +1000m (init) | +640Mi | +3Gi |
| Redis | +200m | +128Mi | - |
| MinIO | +1000m (sidecar) | +1024Mi | +2Gi |
| MariaDB | +300m | +256Mi | +2Gi |
The platform adds buffer for rolling updates when both old and new pods run simultaneously:
+ Rolling update buffer: +100m CPU requests, +256Mi memory requests
+ Limits buffer: +15% on CPU limits, +15% on memory limits
| Configuration | CPU Limit | Memory Limit | Storage |
|---|---|---|---|
| App only | 300m | 512Mi | 1Gi |
| App + PostgreSQL | 800m | 1Gi | 3Gi |
| App + PostgreSQL + Redis | 1000m | 1.1Gi | 3Gi |
| All databases enabled | 2100m | 2.4Gi | 9Gi |
-
Wildcard DNS:
*.k8s-ee.genesluna.devresolves to VPS IP -
Preview URLs:
{project-id}-pr-{number}.k8s-ee.genesluna.dev -
Service URLs:
- Grafana:
grafana.k8s-ee.genesluna.dev - Prometheus:
prometheus.k8s-ee.genesluna.dev
- Grafana:
Each PR namespace has NetworkPolicies that:
- Deny all ingress by default - No cross-namespace traffic
- Allow ingress from Traefik - Only through the ingress controller
- Allow egress to DNS - Required for service discovery
- Allow egress to Kubernetes API - Required for operators
- Allow internal namespace traffic - App can reach its databases
| SLI | Target |
|---|---|
| PRs with URL delivered in < 10 min | >= 95% |
| Namespaces removed in < 5 min after close | >= 98% |
| Pods with metrics/logs collected | >= 95% |
| Metric | Target |
|---|---|
| Environment creation time | <= 10 min (p95) |
| Namespace destruction time | < 5 min |
| Observability stack overhead | < 6 GB RAM |
| Metric | Target |
|---|---|
| Simultaneous PRs supported | >= 5 |
| Log retention | 7 days |
| Metric retention | 7 days |
| Metric | Target |
|---|---|
| Cluster uptime (business hours) | >= 95% |
| Automatic recovery after VPS reboot | Yes |
- Architecture - Architecture landing page
- EKS-Migration-Guide - Phase 2 migration planning
- Configuration-Reference - k8s-ee.yaml schema reference
- Security-and-Access-Control - Security architecture details
- Service-Development - Database and storage patterns
Getting Started
User Guides
- Database Setup
- Database Migrations
- Database Seeding
- Service Development
- Security and Access Control
- Alternative CI Integration
Troubleshooting
Operations
- VPS Access
- K3s Operations
- ARC Runner Management
- Cleanup Job
- Preserve Environment
- Grafana Dashboards
- Cluster Recovery
Architecture
Demo Applications
Development