System Overview

This page provides detailed technical documentation of the k8s-ephemeral-environments platform architecture, including infrastructure components, namespace organization, and the PR environment lifecycle.

High-Level Architecture

The platform runs on a single VPS with k3s, hosting both permanent infrastructure components and ephemeral PR environments.

+---------------------------------------------------------------------+
|                     VPS (4 vCPU, 24GB RAM, 100GB NVMe)              |
|  +---------------------------------------------------------------+  |
|  |                        k3s Cluster                            |  |
|  |                                                               |  |
|  |  +-----------------+  +-----------------+  +-----------------+ |  |
|  |  |  observability  |  |   arc-runners   |  |  app-pr-123    | |  |
|  |  |                 |  |                 |  |  (ephemeral)   | |  |
|  |  | - Prometheus    |  | - Runner x2     |  |                | |  |
|  |  | - Loki          |  |                 |  | - App Pod      | |  |
|  |  | - Grafana       |  |                 |  | - DB Pod       | |  |
|  |  +-----------------+  +-----------------+  +-----------------+ |  |
|  |                                                               |  |
|  |  +-----------------+  +-----------------+  +-----------------+ |  |
|  |  |   app-pr-456   |  |   app-pr-789   |  |    platform     | |  |
|  |  |  (ephemeral)   |  |  (ephemeral)   |  |    (system)     | |  |
|  |  +-----------------+  +-----------------+  +-----------------+ |  |
|  +---------------------------------------------------------------+  |
+---------------------------------------------------------------------+

Infrastructure Details

Attribute	Value
Provider	Oracle Cloud Infrastructure (OCI)
Public IP	`168.138.151.63`
Hostname	`genilda`
OS	Ubuntu 24.04.3 LTS (Noble Numbat)
Architecture	ARM64 (aarch64)
vCPUs	4
RAM	24 GB
Disk	96 GB NVMe

Important: All container images must support linux/arm64 architecture.

Namespace Structure

The cluster organizes workloads into permanent system namespaces and ephemeral PR namespaces.

Namespace	Purpose	Lifecycle
`kube-system`	k3s core components, Traefik ingress	Permanent
`observability`	Prometheus, Loki, Grafana	Permanent
`arc-systems`	ARC controller (manages runner lifecycle)	Permanent
`arc-runners`	GitHub Actions self-hosted runner pods	Permanent
`platform`	Shared base components, CronJobs	Permanent
`{project-id}-pr-{number}`	Ephemeral environment per PR	Ephemeral (PR lifecycle)

Namespace Naming Convention

Ephemeral namespaces follow the pattern: {project-id}-pr-{number}

Examples:

k8s-ee-pr-28 - PR #28 in the k8s-ee project
my-app-pr-156 - PR #156 in the my-app project

Technology Stack

Component	Technology	Justification
Kubernetes	k3s	Lightweight, production-ready, ideal for single-node
Ingress	Traefik	Included in k3s, native Let's Encrypt support
CI/CD	GitHub Actions	Native integration, familiar to developers
Logs	Loki + Promtail	Lightweight, native Grafana integration
Metrics	Prometheus	Industry standard, broad ecosystem
Dashboards	Grafana	Unified interface for logs and metrics
Runners	actions-runner-controller (ARC)	Ephemeral and scalable runners in cluster
PostgreSQL	CloudNativePG	Manages PostgreSQL lifecycle automatically
MariaDB	mariadb:11	Simple deployment for MySQL-compatible needs
MongoDB	MongoDB Community Operator	Replica set management for NoSQL needs
Redis	redis:7-alpine	High-performance caching
Object Storage	MinIO	S3-compatible file storage
Secrets	Sealed Secrets	Encrypted secrets in git
Storage	Local Path Provisioner	Simple, adequate for MVP
DNS	Wildcard	`*.k8s-ee.genesluna.dev` resolves to VPS IP
Network Isolation	NetworkPolicies (kube-router)	Isolation between PR namespaces
Priority Classes	`platform-critical`, `default-app`	Workload prioritization

PR Environment Lifecycle

The complete flow from PR creation to environment destruction:

+------------+     +------------+     +------------+     +------------+
|  PR Open   |---->|  GitHub    |---->|  Create    |---->|  Deploy    |
|            |     |  Action    |     | Namespace  |     | App + DB   |
+------------+     +------------+     +------------+     +------------+
                                                               |
                                                               v
+------------+     +------------+     +------------+     +------------+
|  PR Close  |---->|  GitHub    |---->|  Delete    |<----|  Preview   |
|  or Merge  |     |  Action    |     | Namespace  |     |    URL     |
+------------+     +------------+     +------------+     +------------+

Detailed Lifecycle Steps

PR Opened - Developer opens a pull request
Organization Validated - PR author's org checked against allowlist
Namespace Created - {project-id}-pr-{number} namespace provisioned
Resource Quotas Applied - Dynamic quotas based on enabled databases
NetworkPolicies Applied - Isolation rules for the namespace
Application Deployed - App + configured databases deployed via Helm
Ingress Created - Public URL becomes available
Bot Comments - PR receives comment with preview URL
Push to PR - New commits trigger automatic re-deployment
Optional Preserve - /preserve command extends environment life
PR Closed/Merged - Namespace destroyed (unless preserved)
Preserve Expiry - Hourly CronJob removes expired preserve labels
Orphan Cleanup - 6-hour CronJob catches any missed namespaces

Preserve Environment Feature

Developers can keep environments alive after PR close using the /preserve command:

Constraint	Value
Default duration	48 hours
Maximum duration	48 hours
Max preserved per user	3 environments
Expiry check	Hourly CronJob

Dynamic Resource Quotas

ResourceQuota is automatically calculated based on enabled databases. No manual configuration required.

Base Resources

Base (app only):    300m CPU,  512Mi memory,  1Gi storage

Per-Database Additions

Database	CPU	Memory	Storage
PostgreSQL	+500m	+512Mi	+2Gi
MongoDB	+1000m (init)	+640Mi	+3Gi
Redis	+200m	+128Mi	-
MinIO	+1000m (sidecar)	+1024Mi	+2Gi
MariaDB	+300m	+256Mi	+2Gi

Rolling Update Headroom

The platform adds buffer for rolling updates when both old and new pods run simultaneously:

+ Rolling update buffer: +100m CPU requests, +256Mi memory requests
+ Limits buffer: +15% on CPU limits, +15% on memory limits

Example Quota Calculations

Configuration	CPU Limit	Memory Limit	Storage
App only	300m	512Mi	1Gi
App + PostgreSQL	800m	1Gi	3Gi
App + PostgreSQL + Redis	1000m	1.1Gi	3Gi
All databases enabled	2100m	2.4Gi	9Gi

Network Architecture

DNS Configuration

Wildcard DNS: *.k8s-ee.genesluna.dev resolves to VPS IP
Preview URLs: {project-id}-pr-{number}.k8s-ee.genesluna.dev
Service URLs:
- Grafana: grafana.k8s-ee.genesluna.dev
- Prometheus: prometheus.k8s-ee.genesluna.dev

Network Isolation

Each PR namespace has NetworkPolicies that:

Deny all ingress by default - No cross-namespace traffic
Allow ingress from Traefik - Only through the ingress controller
Allow egress to DNS - Required for service discovery
Allow egress to Kubernetes API - Required for operators
Allow internal namespace traffic - App can reach its databases

Service Level Objectives

SLI	Target
PRs with URL delivered in < 10 min	>= 95%
Namespaces removed in < 5 min after close	>= 98%
Pods with metrics/logs collected	>= 95%

Non-Functional Requirements

Performance

Metric	Target
Environment creation time	<= 10 min (p95)
Namespace destruction time	< 5 min
Observability stack overhead	< 6 GB RAM

Capacity

Metric	Target
Simultaneous PRs supported	>= 5
Log retention	7 days
Metric retention	7 days

Availability

Metric	Target
Cluster uptime (business hours)	>= 95%
Automatic recovery after VPS reboot	Yes

Related Pages

Architecture - Architecture landing page
EKS-Migration-Guide - Phase 2 migration planning
Configuration-Reference - k8s-ee.yaml schema reference
Security-and-Access-Control - Security architecture details
Service-Development - Database and storage patterns

Home

Getting Started

User Guides

Troubleshooting

Operations

Architecture

Demo Applications

Development

System Overview

System Overview

High-Level Architecture

Infrastructure Details

Namespace Structure

Namespace Naming Convention

Technology Stack

PR Environment Lifecycle

Detailed Lifecycle Steps

Preserve Environment Feature

Dynamic Resource Quotas

Base Resources

Per-Database Additions

Rolling Update Headroom

Example Quota Calculations

Network Architecture

DNS Configuration

Network Isolation

Service Level Objectives

Non-Functional Requirements

Performance

Capacity

Availability

Related Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally