Soju06 · Soju06 · Apr 3, 2026 · Apr 3, 2026 · Apr 3, 2026 · Apr 3, 2026
diff --git a/deploy/helm/codex-lb/README.md b/deploy/helm/codex-lb/README.md
@@ -245,6 +245,184 @@ total_connections = (databasePoolSize + databaseMaxOverflow) × replicas
 
 Keep this within your PostgreSQL `max_connections` budget or place PgBouncer in front of the database.
 
+## Production Deployment
+
+Multi-replica production deployments require careful coordination of database connectivity, session routing, and graceful shutdown. This section covers the key patterns and tuning parameters.
+
+### Prerequisites for Multi-Replica
+
+Single-replica deployments can use SQLite, but **multi-replica requires PostgreSQL**:
+
+- **Database**: PostgreSQL is mandatory for multi-replica because:
+  - SQLite does not support concurrent writes from multiple pods
+  - Leader election requires a shared database backend
+  - Session bridge ring membership is stored in the database
+
+- **Leader Election**: Enabled by default (`config.leaderElectionEnabled=true`)
+  - Ensures only one pod performs background tasks (e.g., session cleanup, metrics aggregation)
+  - Uses database-backed locking with a TTL (`config.leaderElectionTtlSeconds=30`)
+  - If the leader crashes, another pod acquires the lock within 30 seconds
+
+- **Circuit Breaker**: Enabled by default (`config.circuitBreakerEnabled=true`)
+  - Protects upstream API endpoints from cascading failures
+  - Opens after `config.circuitBreakerFailureThreshold=5` consecutive failures
+  - Enters half-open state after `config.circuitBreakerRecoveryTimeoutSeconds=60` seconds
+  - Prevents thundering herd when upstream is degraded
+
+### Session Bridge Ring
+
+The session bridge is an in-memory cache of upstream WebSocket connections, shared across the pod ring.
+
+**Automatic Ring Membership (PostgreSQL)**
+
+When using PostgreSQL, ring membership is **automatic and database-backed**:
+
+- Each pod registers itself in the database on startup
+- The `sessionBridgeInstanceRing` field is **optional** and only needed for manual pod list override
+- Pods discover each other via database queries; no manual configuration required
+- Ring membership is cleaned up automatically when pods terminate
+
+**Manual Ring Override (Advanced)**
+
+If you need to manually specify the pod ring (e.g., for testing or debugging):
+
+```yaml
+config:
+  sessionBridgeInstanceRing: "codex-lb-0.codex-lb.default.svc.cluster.local,codex-lb-1.codex-lb.default.svc.cluster.local"
+```
+
+This is rarely needed in production; the database-backed discovery is preferred.
+
+### Connection Pool Budget
+
+Each pod maintains its own SQLAlchemy connection pool. The total connections across all replicas must fit within PostgreSQL's `max_connections`:
+
+```
+(databasePoolSize + databaseMaxOverflow) × maxReplicas ≤ PostgreSQL max_connections
+```
+
+**Example for `values-prod.yaml`:**
+
+```yaml
+config:
+  databasePoolSize: 3
+  databaseMaxOverflow: 2
+autoscaling:
+  maxReplicas: 20
+```
+
+Calculation: `(3 + 2) × 20 = 100` connections, which fits within PostgreSQL's default `max_connections=100`.
+
+**Tuning:**
+
+- Increase `databasePoolSize` if pods frequently wait for connections
+- Increase `databaseMaxOverflow` for temporary spikes, but keep it small (overflow is slower)
+- Reduce `maxReplicas` if you cannot increase PostgreSQL's `max_connections`
+- Use PgBouncer or pgcat as a connection pooler in front of PostgreSQL if needed
+
+### values-prod.yaml Reference
+
+The `values-prod.yaml` overlay is pre-configured for production multi-replica deployments:
+
+```yaml
+replicaCount: 3                    # Start with 3 replicas
+postgresql:
+  enabled: false                   # Use external PostgreSQL
+autoscaling:
+  enabled: true
+  minReplicas: 3
+  maxReplicas: 20
+  behavior:
+    scaleDown:
+      stabilizationWindowSeconds: 600  # 10 min cooldown (see below)
+affinity:
+  podAntiAffinity: hard            # Spread pods across nodes
+topologySpreadConstraints:
+  - maxSkew: 1
+    topologyKey: topology.kubernetes.io/zone  # Spread across zones
+networkPolicy:
+  enabled: true                    # Restrict ingress/egress
+metrics:
+  serviceMonitor:
+    enabled: true                  # Prometheus scraping
+  prometheusRule:
+    enabled: true                  # Alerting rules
+  grafanaDashboard:
+    enabled: true                  # Pre-built dashboards
+externalSecrets:
+  enabled: true                    # Use External Secrets Operator
+```
+
+Install with:
+
+```bash
+helm install codex-lb oci://ghcr.io/soju06/charts/codex-lb \
+  -f deploy/helm/codex-lb/values-prod.yaml \
+  --set externalDatabase.url='postgresql+asyncpg://user:pass@db.example.com:5432/codexlb'
+```
+
+### Graceful Shutdown Tuning
+
+Graceful shutdown coordinates three timeout parameters to drain in-flight requests and session bridge connections:
+
+```
+preStopSleepSeconds (15s) → shutdownDrainTimeoutSeconds (30s) → terminationGracePeriodSeconds (60s)
+```
+
+**Timeline:**
+
+1. **preStopSleepSeconds (15s)**: Pod receives SIGTERM
+   - Sleep briefly to allow load balancer to remove the pod from rotation
+   - Prevents new requests from arriving during shutdown
+
+2. **shutdownDrainTimeoutSeconds (30s)**: Drain in-flight requests
+   - HTTP server stops accepting new connections
+   - Existing requests are allowed to complete (up to 30 seconds)
+   - Session bridge connections are gracefully closed
+
+3. **terminationGracePeriodSeconds (60s)**: Hard deadline
+   - Total time from SIGTERM to SIGKILL
+   - Must be ≥ `preStopSleepSeconds + shutdownDrainTimeoutSeconds`
+   - Default 60s allows 15s + 30s + 15s buffer
+
+**Tuning:**
+
+- Increase `preStopSleepSeconds` if your load balancer takes longer to deregister
+- Increase `shutdownDrainTimeoutSeconds` if requests typically take >30s to complete
+- Increase `terminationGracePeriodSeconds` proportionally (must be larger than the sum)
+- Keep the buffer small; long shutdown times delay pod replacement
+
+Example for long-running requests:
+
+```yaml
+preStopSleepSeconds: 20
+shutdownDrainTimeoutSeconds: 60
+terminationGracePeriodSeconds: 90
+```
+
+### Scale-Down Caution
+
+The `stabilizationWindowSeconds: 600` (10 minutes) in `values-prod.yaml` is intentionally high.
+
+**Why?**
+
+- Session bridge connections have idle TTLs (`sessionBridgeIdleTtlSeconds=120` for API, `sessionBridgeCodexIdleTtlSeconds=900` for Codex)
+- When a pod scales down, its in-memory sessions are lost
+- Clients reconnecting to a different pod must re-establish upstream connections
+- A 10-minute cooldown prevents rapid scale-down/up cycles that would thrash session state
+
+**Behavior:**
+
+- HPA will scale down at most 1 pod every 2 minutes (when cooldown is active)
+- If load drops suddenly, scale-down is delayed by up to 10 minutes
+- This trades off faster scale-down for session stability
+
+**Tuning:**
+
+- Reduce `stabilizationWindowSeconds` if you prioritize cost over session stability
+- Increase it if you see frequent session reconnections during scale events
+- Monitor `sessionBridgeInstanceRing` size changes in logs to detect scale-down impact
+
 ## Security
 
 The chart targets the Kubernetes Restricted Pod Security Standard.

diff --git a/deploy/helm/codex-lb/templates/_helpers.tpl b/deploy/helm/codex-lb/templates/_helpers.tpl
@@ -139,3 +139,22 @@ Image string — resolves registry/repository:tag with optional digest override
 {{- printf "%s/%s:%s" $registry $repository $tag }}
 {{- end }}
 {{- end }}
+
+{{/*
+Merged nodeSelector: global.nodeSelector + local nodeSelector (local wins).
+*/}}
+{{- define "codex-lb.nodeSelector" -}}
+{{- $merged := mustMergeOverwrite (deepCopy (.Values.global.nodeSelector | default dict)) (.Values.nodeSelector | default dict) -}}
+{{- if $merged }}
+{{- toYaml $merged }}
+{{- end }}
+{{- end -}}
+
+{{/*
+Global-only nodeSelector for hooks/tests so app-specific placement does not block installs.
+*/}}
+{{- define "codex-lb.globalNodeSelector" -}}
+{{- with (.Values.global.nodeSelector | default dict) }}
+{{- toYaml . }}
+{{- end }}
+{{- end -}}
diff --git a/deploy/helm/codex-lb/templates/deployment.yaml b/deploy/helm/codex-lb/templates/deployment.yaml
@@ -89,9 +89,9 @@ spec:
       topologySpreadConstraints:
         {{- toYaml . | nindent 8 }}
       {{- end }}
-      {{- with .Values.nodeSelector }}
+      {{- with (include "codex-lb.nodeSelector" .) }}
       nodeSelector:
-        {{- toYaml . | nindent 8 }}
+        {{- . | nindent 8 }}
       {{- end }}
       {{- with .Values.tolerations }}
       tolerations:

diff --git a/deploy/helm/codex-lb/templates/hooks/db-init-job.yaml b/deploy/helm/codex-lb/templates/hooks/db-init-job.yaml
@@ -0,0 +1,91 @@
+{{- if and .Values.dbInit.enabled (not .Values.postgresql.enabled) }}
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: {{ printf "%s-db-init" (include "codex-lb.fullname" . | trunc 52 | trimSuffix "-") }}
+  namespace: {{ .Release.Namespace | quote }}
+  labels:
+    {{- include "codex-lb.labels" . | nindent 4 }}
+  annotations:
+    "helm.sh/hook": pre-install
+    "helm.sh/hook-weight": "-10"
+    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
+spec:
+  template:
+    spec:
+      restartPolicy: OnFailure
+      automountServiceAccountToken: false
+      {{- with .Values.podSecurityContext }}
+      securityContext:
+        {{- toYaml . | nindent 8 }}
+      {{- end }}
+      {{- $pullSecrets := concat (.Values.global.imagePullSecrets | default list) (.Values.image.pullSecrets | default list) }}
+      {{- if $pullSecrets }}
+      imagePullSecrets:
+        {{- range $pullSecrets }}
+        - name: {{ . }}
+        {{- end }}
+      {{- end }}
+      {{- with (include "codex-lb.globalNodeSelector" .) }}
+      nodeSelector:
+        {{- . | nindent 8 }}
+      {{- end }}
+      containers:
+        - name: db-init
+          image: {{ printf "%s/bitnami/postgresql:16" (.Values.global.imageRegistry | default "docker.io") }}
+          imagePullPolicy: IfNotPresent
+          {{- with .Values.containerSecurityContext }}
+          securityContext:
+            {{- toYaml . | nindent 12 }}
+          {{- end }}
+          command: ["sh", "-ec"]
+          args:
+            - |
+              PGPASSWORD="$ADMIN_PASSWORD" psql \
+                -h "$DB_HOST" -p "$DB_PORT" -U "$ADMIN_USER" -d postgres <<'SQL'
+              {{- range .Values.dbInit.databases }}
+              {{- $dbTag := printf "db_%s" ((printf "%s" .name) | sha256sum | trunc 12) }}
+              {{- $userTag := printf "user_%s" ((printf "%s" .user) | sha256sum | trunc 12) }}
+              {{- $passTag := printf "pass_%s" ((printf "%s" .password) | sha256sum | trunc 12) }}
+              DO $$ BEGIN
+                IF NOT EXISTS (SELECT FROM pg_roles WHERE rolname = ${{ $userTag }}${{ .user }}${{ $userTag }}$) THEN
+                  EXECUTE format(
+                    'CREATE ROLE %I WITH LOGIN PASSWORD %L',
+                    ${{ $userTag }}${{ .user }}${{ $userTag }}$,
+                    ${{ $passTag }}${{ .password }}${{ $passTag }}$
+                  );
+                END IF;
+              END $$;
+              SELECT format(
+                'CREATE DATABASE %I OWNER %I',
+                ${{ $dbTag }}${{ .name }}${{ $dbTag }}$,
+                ${{ $userTag }}${{ .user }}${{ $userTag }}$
+              )
+              WHERE NOT EXISTS (
+                SELECT FROM pg_database WHERE datname = ${{ $dbTag }}${{ .name }}${{ $dbTag }}$
+              )\gexec
+              SELECT format(
+                'GRANT ALL PRIVILEGES ON DATABASE %I TO %I',
+                ${{ $dbTag }}${{ .name }}${{ $dbTag }}$,
+                ${{ $userTag }}${{ .user }}${{ $userTag }}$
+              )\gexec
+              {{- end }}
+              SQL
+          env:
+            - name: DB_HOST
+              value: {{ .Values.dbInit.host | quote }}
+            - name: DB_PORT
+              value: {{ .Values.dbInit.port | default "5432" | quote }}
+            - name: ADMIN_USER
+              value: {{ .Values.dbInit.adminUser | quote }}
+            - name: ADMIN_PASSWORD
+              {{- if .Values.dbInit.adminPasswordSecret }}
+              valueFrom:
+                secretKeyRef:
+                  name: {{ .Values.dbInit.adminPasswordSecret.name }}
+                  key: {{ .Values.dbInit.adminPasswordSecret.key }}
+              {{- else }}
+              value: {{ .Values.dbInit.adminPassword | quote }}
+              {{- end }}
+  backoffLimit: 3
+{{- end }}
diff --git a/deploy/helm/codex-lb/templates/hooks/migration-job.yaml b/deploy/helm/codex-lb/templates/hooks/migration-job.yaml
@@ -36,6 +36,10 @@ spec:
         - name: {{ . }}
         {{- end }}
       {{- end }}
+      {{- with (include "codex-lb.globalNodeSelector" .) }}
+      nodeSelector:
+        {{- . | nindent 8 }}
+      {{- end }}
       {{- if .Values.postgresql.enabled }}
       initContainers:
          - name: wait-for-db

diff --git a/deploy/helm/codex-lb/templates/httproute.yaml b/deploy/helm/codex-lb/templates/httproute.yaml
@@ -20,5 +20,5 @@ spec:
   rules:
     - backendRefs:
         - name: {{ include "codex-lb.fullname" . }}
-          port: 2455
+          port: {{ .Values.service.port }}
 {{- end }}
diff --git a/deploy/helm/codex-lb/templates/service.yaml b/deploy/helm/codex-lb/templates/service.yaml
@@ -5,16 +5,26 @@ metadata:
   namespace: {{ .Release.Namespace | quote }}
   labels:
     {{- include "codex-lb.labels" . | nindent 4 }}
-  {{- with .Values.commonAnnotations }}
+  {{- with mustMerge (.Values.service.annotations | default dict) (.Values.commonAnnotations | default dict) }}
   annotations:
     {{- toYaml . | nindent 4 }}
   {{- end }}
 spec:
-  type: ClusterIP
+  type: {{ .Values.service.type }}
+  {{- if and (eq .Values.service.type "LoadBalancer") .Values.service.loadBalancerIP }}
+  loadBalancerIP: {{ .Values.service.loadBalancerIP }}
+  {{- end }}
+  {{- if and (eq .Values.service.type "LoadBalancer") .Values.service.loadBalancerSourceRanges }}
+  loadBalancerSourceRanges:
+    {{- toYaml .Values.service.loadBalancerSourceRanges | nindent 4 }}
+  {{- end }}
   selector:
     {{- include "codex-lb.selectorLabels" . | nindent 4 }}
   ports:
     - name: http
-      port: 2455
+      port: {{ .Values.service.port }}
       targetPort: http
       protocol: TCP
+      {{- if and (eq .Values.service.type "NodePort") .Values.service.nodePort }}
+      nodePort: {{ .Values.service.nodePort }}
+      {{- end }}
diff --git a/deploy/helm/codex-lb/templates/tests/test-connection.yaml b/deploy/helm/codex-lb/templates/tests/test-connection.yaml
@@ -15,8 +15,12 @@ spec:
     runAsUser: 1000
     seccompProfile:
       type: RuntimeDefault
+  {{- with (include "codex-lb.globalNodeSelector" .) }}
+  nodeSelector:
+    {{- . | nindent 4 }}
+  {{- end }}
   containers:
-    - name: test-connection
+    - name: test-health
       image: busybox:1.37
       imagePullPolicy: IfNotPresent
       securityContext:
@@ -29,5 +33,10 @@ spec:
         - sh
         - -c
         - |
-          wget --spider --timeout=10 http://{{ include "codex-lb.fullname" . }}:2455/health || exit 1
-          echo "Connection test passed!"
+          echo "=== Health endpoint ==="
+          wget --spider --timeout=10 http://{{ include "codex-lb.fullname" . }}:{{ .Values.service.port | default 2455 }}/health || exit 1
+          echo "Health check passed!"
+
+          echo "=== Startup probe ==="
+          wget -qO- --timeout=10 http://{{ include "codex-lb.fullname" . }}:{{ .Values.service.port | default 2455 }}/health/ready || exit 1
+          echo "Readiness check passed!"