Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions deploy/helm/codex-lb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,184 @@ total_connections = (databasePoolSize + databaseMaxOverflow) × replicas

Keep this within your PostgreSQL `max_connections` budget or place PgBouncer in front of the database.

## Production Deployment

Multi-replica production deployments require careful coordination of database connectivity, session routing, and graceful shutdown. This section covers the key patterns and tuning parameters.

### Prerequisites for Multi-Replica

Single-replica deployments can use SQLite, but **multi-replica requires PostgreSQL**:

- **Database**: PostgreSQL is mandatory for multi-replica because:
- SQLite does not support concurrent writes from multiple pods
- Leader election requires a shared database backend
- Session bridge ring membership is stored in the database

- **Leader Election**: Enabled by default (`config.leaderElectionEnabled=true`)
- Ensures only one pod performs background tasks (e.g., session cleanup, metrics aggregation)
- Uses database-backed locking with a TTL (`config.leaderElectionTtlSeconds=30`)
- If the leader crashes, another pod acquires the lock within 30 seconds

- **Circuit Breaker**: Enabled by default (`config.circuitBreakerEnabled=true`)
- Protects upstream API endpoints from cascading failures
- Opens after `config.circuitBreakerFailureThreshold=5` consecutive failures
- Enters half-open state after `config.circuitBreakerRecoveryTimeoutSeconds=60` seconds
- Prevents thundering herd when upstream is degraded

### Session Bridge Ring

The session bridge is an in-memory cache of upstream WebSocket connections, shared across the pod ring.

**Automatic Ring Membership (PostgreSQL)**

When using PostgreSQL, ring membership is **automatic and database-backed**:

- Each pod registers itself in the database on startup
- The `sessionBridgeInstanceRing` field is **optional** and only needed for manual pod list override
- Pods discover each other via database queries; no manual configuration required
- Ring membership is cleaned up automatically when pods terminate

**Manual Ring Override (Advanced)**

If you need to manually specify the pod ring (e.g., for testing or debugging):

```yaml
config:
sessionBridgeInstanceRing: "codex-lb-0.codex-lb.default.svc.cluster.local,codex-lb-1.codex-lb.default.svc.cluster.local"
```

This is rarely needed in production; the database-backed discovery is preferred.

### Connection Pool Budget

Each pod maintains its own SQLAlchemy connection pool. The total connections across all replicas must fit within PostgreSQL's `max_connections`:

```
(databasePoolSize + databaseMaxOverflow) × maxReplicas ≤ PostgreSQL max_connections
```

**Example for `values-prod.yaml`:**

```yaml
config:
databasePoolSize: 3
databaseMaxOverflow: 2
autoscaling:
maxReplicas: 20
```

Calculation: `(3 + 2) × 20 = 100` connections, which fits within PostgreSQL's default `max_connections=100`.

**Tuning:**

- Increase `databasePoolSize` if pods frequently wait for connections
- Increase `databaseMaxOverflow` for temporary spikes, but keep it small (overflow is slower)
- Reduce `maxReplicas` if you cannot increase PostgreSQL's `max_connections`
- Use PgBouncer or pgcat as a connection pooler in front of PostgreSQL if needed

### values-prod.yaml Reference

The `values-prod.yaml` overlay is pre-configured for production multi-replica deployments:

```yaml
replicaCount: 3 # Start with 3 replicas
postgresql:
enabled: false # Use external PostgreSQL
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # 10 min cooldown (see below)
affinity:
podAntiAffinity: hard # Spread pods across nodes
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone # Spread across zones
networkPolicy:
enabled: true # Restrict ingress/egress
metrics:
serviceMonitor:
enabled: true # Prometheus scraping
prometheusRule:
enabled: true # Alerting rules
grafanaDashboard:
enabled: true # Pre-built dashboards
externalSecrets:
enabled: true # Use External Secrets Operator
```

Install with:

```bash
helm install codex-lb oci://ghcr.io/soju06/charts/codex-lb \
-f deploy/helm/codex-lb/values-prod.yaml \
--set externalDatabase.url='postgresql+asyncpg://user:pass@db.example.com:5432/codexlb'
```

### Graceful Shutdown Tuning

Graceful shutdown coordinates three timeout parameters to drain in-flight requests and session bridge connections:

```
preStopSleepSeconds (15s) → shutdownDrainTimeoutSeconds (30s) → terminationGracePeriodSeconds (60s)
```

**Timeline:**

1. **preStopSleepSeconds (15s)**: Pod receives SIGTERM
- Sleep briefly to allow load balancer to remove the pod from rotation
- Prevents new requests from arriving during shutdown

2. **shutdownDrainTimeoutSeconds (30s)**: Drain in-flight requests
- HTTP server stops accepting new connections
- Existing requests are allowed to complete (up to 30 seconds)
- Session bridge connections are gracefully closed

3. **terminationGracePeriodSeconds (60s)**: Hard deadline
- Total time from SIGTERM to SIGKILL
- Must be ≥ `preStopSleepSeconds + shutdownDrainTimeoutSeconds`
- Default 60s allows 15s + 30s + 15s buffer

**Tuning:**

- Increase `preStopSleepSeconds` if your load balancer takes longer to deregister
- Increase `shutdownDrainTimeoutSeconds` if requests typically take >30s to complete
- Increase `terminationGracePeriodSeconds` proportionally (must be larger than the sum)
- Keep the buffer small; long shutdown times delay pod replacement

Example for long-running requests:

```yaml
preStopSleepSeconds: 20
shutdownDrainTimeoutSeconds: 60
terminationGracePeriodSeconds: 90
```

### Scale-Down Caution

The `stabilizationWindowSeconds: 600` (10 minutes) in `values-prod.yaml` is intentionally high.

**Why?**

- Session bridge connections have idle TTLs (`sessionBridgeIdleTtlSeconds=120` for API, `sessionBridgeCodexIdleTtlSeconds=900` for Codex)
- When a pod scales down, its in-memory sessions are lost
- Clients reconnecting to a different pod must re-establish upstream connections
- A 10-minute cooldown prevents rapid scale-down/up cycles that would thrash session state

**Behavior:**

- HPA will scale down at most 1 pod every 2 minutes (when cooldown is active)
- If load drops suddenly, scale-down is delayed by up to 10 minutes
- This trades off faster scale-down for session stability

**Tuning:**

- Reduce `stabilizationWindowSeconds` if you prioritize cost over session stability
- Increase it if you see frequent session reconnections during scale events
- Monitor `sessionBridgeInstanceRing` size changes in logs to detect scale-down impact

## Security

The chart targets the Kubernetes Restricted Pod Security Standard.
Expand Down
10 changes: 10 additions & 0 deletions deploy/helm/codex-lb/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -139,3 +139,13 @@ Image string — resolves registry/repository:tag with optional digest override
{{- printf "%s/%s:%s" $registry $repository $tag }}
{{- end }}
{{- end }}

{{/*
Merged nodeSelector: global.nodeSelector + local nodeSelector (local wins).
*/}}
{{- define "codex-lb.nodeSelector" -}}
{{- $merged := mustMerge (.Values.nodeSelector | default dict) (.Values.global.nodeSelector | default dict) -}}
{{- if $merged }}
{{- toYaml $merged }}
{{- end }}
{{- end -}}
8 changes: 4 additions & 4 deletions deploy/helm/codex-lb/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -89,10 +89,10 @@ spec:
topologySpreadConstraints:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with (include "codex-lb.nodeSelector" .) }}
nodeSelector:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fix pod-spec indentation for deployment nodeSelector

The nodeSelector key in deployment.yaml is indented one space deeper than the surrounding pod-spec fields. When global.nodeSelector or nodeSelector is set, this block is emitted and no longer aligns with sibling keys like tolerations, producing invalid manifest structure and causing Helm render/apply failures for node-pinned deployments.

Useful? React with 👍 / 👎.

{{- . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
Expand Down
57 changes: 57 additions & 0 deletions deploy/helm/codex-lb/templates/hooks/db-init-job.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
{{- if and .Values.dbInit.enabled (not .Values.postgresql.enabled) }}
apiVersion: batch/v1
kind: Job
metadata:
name: {{ printf "%s-db-init" (include "codex-lb.fullname" . | trunc 52 | trimSuffix "-") }}
namespace: {{ .Release.Namespace | quote }}
labels:
{{- include "codex-lb.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": pre-install
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
spec:
restartPolicy: OnFailure
{{- with .Values.nodeSelector }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply merged global selector to db-init hook pod

The db-init Job still reads only .Values.nodeSelector, so deployments that rely on the new global.nodeSelector alone will not propagate selectors to this hook pod. In environments where scheduling requires explicit node labels (for example single-arch pools), the pre-install db-init hook can remain unschedulable and block installation.

Useful? React with 👍 / 👎.

nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: db-init
image: {{ printf "%s/bitnami/postgresql:16" (.Values.global.imageRegistry | default "docker.io") }}
command: ["sh", "-ec"]
args:
- |
PGPASSWORD="$ADMIN_PASSWORD" psql \
-h "$DB_HOST" -p "$DB_PORT" -U "$ADMIN_USER" -d postgres <<'SQL'
{{- range .Values.dbInit.databases }}
DO $$ BEGIN
IF NOT EXISTS (SELECT FROM pg_roles WHERE rolname = '{{ .user }}') THEN
CREATE ROLE {{ .user }} WITH LOGIN PASSWORD '{{ .password }}';
END IF;
END $$;
SELECT format('CREATE DATABASE %I OWNER %I', '{{ .name }}', '{{ .user }}')
WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = '{{ .name }}')\gexec
GRANT ALL PRIVILEGES ON DATABASE {{ .name }} TO {{ .user }};
{{- end }}
SQL
env:
- name: DB_HOST
value: {{ .Values.dbInit.host | quote }}
- name: DB_PORT
value: {{ .Values.dbInit.port | default "5432" | quote }}
- name: ADMIN_USER
value: {{ .Values.dbInit.adminUser | quote }}
- name: ADMIN_PASSWORD
{{- if .Values.dbInit.adminPasswordSecret }}
valueFrom:
secretKeyRef:
name: {{ .Values.dbInit.adminPasswordSecret.name }}
key: {{ .Values.dbInit.adminPasswordSecret.key }}
{{- else }}
value: {{ .Values.dbInit.adminPassword | quote }}
{{- end }}
backoffLimit: 3
{{- end }}
8 changes: 6 additions & 2 deletions deploy/helm/codex-lb/templates/hooks/migration-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,12 @@ spec:
{{- range $pullSecrets }}
- name: {{ . }}
{{- end }}
{{- end }}
{{- if .Values.postgresql.enabled }}
{{- end }}
{{- with (include "codex-lb.nodeSelector" .) }}
nodeSelector:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Correct migration hook nodeSelector indentation

The new nodeSelector block in the migration Job pod spec is also indented one space too far. If a selector is configured, this emits misaligned YAML relative to other keys in spec.template.spec, which breaks manifest parsing and prevents the migration hook from rendering/installing.

Useful? React with 👍 / 👎.

{{- . | nindent 8 }}
{{- end }}
{{- if .Values.postgresql.enabled }}
initContainers:
- name: wait-for-db
image: postgres:16-alpine
Expand Down
4 changes: 4 additions & 0 deletions deploy/helm/codex-lb/templates/tests/test-connection.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
{{- with (include "codex-lb.nodeSelector" .) }}
nodeSelector:
{{- . | nindent 4 }}
{{- end }}
containers:
- name: test-connection
image: busybox:1.37
Expand Down
33 changes: 33 additions & 0 deletions deploy/helm/codex-lb/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ global:
imagePullSecrets: []
# @param global.storageClass Global storage class for PVCs
storageClass: ""
# @param global.nodeSelector Node selector labels applied to ALL pods (deployment, jobs, tests)
nodeSelector: {}

# @section Common parameters
# @param nameOverride Override the chart name
Expand Down Expand Up @@ -295,6 +297,13 @@ metrics:
port: 9090
serviceMonitor:
# @param metrics.serviceMonitor.enabled Create a ServiceMonitor for Prometheus Operator
# Defaults to false because it requires the Prometheus Operator CRDs to be installed.
# Enable this when running kube-prometheus-stack or standalone Prometheus Operator.
# Without ServiceMonitor, configure prometheus.io annotations for scraping:
# commonAnnotations:
# prometheus.io/scrape: "true"
# prometheus.io/port: "9090"
# prometheus.io/path: "/metrics"
enabled: false
# @param metrics.serviceMonitor.interval Metrics scrape interval
interval: 30s
Expand All @@ -306,6 +315,8 @@ metrics:
relabelings: []
prometheusRule:
# @param metrics.prometheusRule.enabled Create PrometheusRule for alerting
# Requires Prometheus Operator. Includes alerts for: high error rate, circuit breaker open,
# all accounts exhausted, high request latency. See templates/prometheusrule.yaml for details.
enabled: false
# @param metrics.prometheusRule.additionalLabels Additional labels for PrometheusRule
additionalLabels: {}
Expand Down Expand Up @@ -355,6 +366,28 @@ externalDatabase:
# @param externalDatabase.existingSecret Secret containing external DB credentials
existingSecret: ""

# @section Database initialization parameters
dbInit:
# @param dbInit.enabled Enable pre-install Job to create databases/users on external PostgreSQL
enabled: false
# @param dbInit.host External PostgreSQL host
host: ""
# @param dbInit.port External PostgreSQL port
port: "5432"
# @param dbInit.adminUser Admin username for creating databases/users
adminUser: "adminuser"
# @param dbInit.adminPassword Admin password (use adminPasswordSecret for production)
adminPassword: ""
# @param dbInit.adminPasswordSecret Reference to existing Secret for admin password
adminPasswordSecret: {}
# name: pg-admin-secret
# key: password
# @param dbInit.databases List of databases and users to create
databases:
- name: codexlb
user: codexlb
password: changeme

# @section Migration parameters
migration:
# @param migration.enabled Run database migration Job on install/upgrade
Expand Down
Loading