Koldun orchestrates distributed-llama inference topologies on Kubernetes. The single cmd/operator binary exposes controller, ingress backend, dispatcher, and LLM worker modes that coordinate model download, topology creation, NATS-based messaging, and conversation lifecycle management. Wrangler provides the reconciliation engine, and JetStream acts as the persistent queue for chat sessions, assignments, and registry metadata.
- Manages Session → Dllama → Worker hierarchies, wiring Deployments, StatefulSets, and Jobs for distributed-llama clusters.
- Streams Hugging Face models into S3/MinIO, converts artifacts (GGUF), and publishes sizing metadata for scheduling.
- Ingests chat traffic through an OpenAI-compatible backend that hashes API tokens, supervises dispatcher pools, and mirrors readiness back to clients.
- Automatically synchronises NATS connection URLs from
Ingressresources to dependentDllamaobjects to avoid configuration drift.
cmd/operator/— entrypoint wiring CLI flags to the desired mode.pkg/apis/koldun.gorizond.io/v1/— CRD type definitions and DeepCopy implementations.pkg/controllers/— Wrangler reconcilers plus helpers incommon.go; tests (e.g.memory_test.go) sit beside implementations.pkg/servers/{ingress,dispatcher,llm,operator}/— HTTP servers and workers for runtime modes.pkg/conversation,pkg/registry,pkg/tokens— shared JetStream contracts, KV helpers, and token handling.charts/Helm chart,k8s/raw manifests, rootDockerfileandskaffold.yamlfor image builds.
| Kind | Purpose | Notes |
|---|---|---|
Session |
Supervises dispatcher Deployments and pools of generated Dllama resources per conversation hash. |
Scales via spec.sessionScaling thresholds and stores queue prefixes/assignment buckets. |
Dllama |
Expands a distributed-llama topology (root + workers) for a specific model. | Ensures referenced Model exposes status.outputPVCName; inherits NATS URL from Ingress automatically. |
Model |
Downloads, converts, and sizes model artifacts. | Creates downloader/convert Jobs, S3 CSI PV/PVC, and publishes size metadata. |
Root |
Renders the distributed-llama root coordinator Deployment/Service. | Watches pods to update readiness. |
Worker |
Manages single-slot worker StatefulSets. | Tracks per-slot readiness for dispatcher accounting. |
Ingress |
Declares the public ingress/backend bundle. | Generates backend Deployment, Service, Kubernetes Ingress, and publishes NATS/registry configuration. |
| Mode | What It Runs |
|---|---|
operator (default) |
Registers controllers, reconciles CRDs, publishes model/token registries, and synchronises JetStream conversation TTLs into Session/Dllama resources. |
ingress (alias backend) |
OpenAI-compatible HTTP edge; authenticates API tokens, maintains conversation KV records, and bridges requests to NATS. |
dispatcher |
Consumes backlog subjects, writes assignment KV entries, and fans work out to ready Dllama workers while tracking heartbeats. |
llm |
Sidecar-facing worker that streams completions between the dllama-api process and NATS out.<hash> subjects. |
- Backend validates an API token (mirrored from Secrets), computes
hash_koldun, and stores a JSON manifest in the JetStream KV bucket (backend-conversation-bucket). - Operator watches the bucket and ensures matching
Session/Dllamaresources exist, labelling them withkoldun.gorizond.io/hash. - Dispatcher reads backlog messages (
sessions.<hash>.requests), assigns them to workers, and records progress in the assignments bucket. - LLM workers call the dllama-api sidecar, stream completion chunks to
out.<hash>, and ping state subjects so the dispatcher can recycle slots.
- Go 1.24+, Docker/Skaffold (for container builds), and access to a Kubernetes cluster (Kubernetes 1.30+) with JetStream-enabled NATS.
- Helm 3 if you plan to install via the included chart.
go fmt ./... && gofmt -w .
go build ./cmd/operator
# Run controllers against your kubeconfig
go run ./cmd/operator --mode=operator --kubeconfig ~/.kube/config# Start the OpenAI-compatible ingress backend
KOLDUN_API_TOKEN=... \
go run ./cmd/operator --mode=ingress \
--backend-namespace default \
--backend-nats-url nats://user:pass@nats.default:4222
# Launch a standalone worker connected to an existing dispatcher
HASH_KOLDUN=... \
go run ./cmd/operator --mode=llm \
--llm-request-subject "sessions.${HASH_KOLDUN}.dllama.0.in" \
--llm-state-subject "sessions.${HASH_KOLDUN}.dllama.0.state"
# Launch a dispatcher pinned to one conversation backlog
HASH_KOLDUN=... \
go run ./cmd/operator --mode=dispatcher \
--dispatcher-hash "${HASH_KOLDUN}" \
--dispatcher-nats-url nats://koldun:k0ldun@nats.default:4222 \
--dispatcher-backlog-subject "sessions.${HASH_KOLDUN}.requests" \
--dispatcher-assignments-bucket koldun_assignments \
--dispatcher-dllama-prefix "sessions.${HASH_KOLDUN}.dllama." \
--dispatcher-state-prefix "sessions.${HASH_KOLDUN}.dllama." \
--dispatcher-queue-group "dispatcher-${HASH_KOLDUN}" \
--dispatcher-ack-wait 2m--dispatcher-state-prefixmust match the subject prefix used by worker heartbeats and always end with.. When Sessions are generated viaIngressor Helm templates you can override this by settingspec.backend.queue.stateStream; if the stream already contains dots, the session controller copies it directly into the dispatcher args so manual Deployments stay in sync with CRD-driven ones.- Set
--dispatcher-metrics-listen=:9090(or another host:port) to expose/metricsand/healthzfrom dispatcher pods. The sample manifest ink8s/dispatcher-deploy.yamlenables the listener, opens port9090, and appends a ClusterIP Service plus PodMonitor example so Prometheus or readiness probes can scrape the endpoints immediately. - Session CRDs expose
spec.dispatcherMetricsListen, andIngress.spec.backend.dispatcherMetricsListenalong with--backend-session-dispatcher-metrics-listenkeep the generated dispatcher Deployments in lock-step with manual pods so/metricsis always wired consistently. Helm users can also setingressDefaults.dispatcherMetricsListen(or per-ingress overrides) to inject the flag without hand-editing every spec snippet.
Prometheus Operator setups can now reuse the PodMonitor example bundled with k8s/dispatcher-deploy.yaml (update the release label/namespace to match your stack). If you prefer Service-based scraping, the same manifest also ships a ClusterIP ready for ServiceMonitors; copy it per session dispatcher because the controller does not generate Services automatically to avoid per-session resource churn.
- Helm: Edit
charts/koldun/values.yaml(images, NATS, session scaling) thenhelm install koldun charts/koldun. - Raw manifests: Apply the CRDs, controllers, and sample resources from the
k8s/directory. Usek8s/dispatcher-deploy.yamlwhen you need a dedicated dispatcher Deployment; it demonstrates the required--dispatcher-state-prefixflag and the optional--dispatcher-metrics-listenendpoint, which now also propagates fromspec.dispatcherMetricsListenon Sessions/Ingresses. - Use
skaffold buildto publish the container imageghcr.io/gorizond/koldunbefore updating chart values.
Example Model that streams a Hugging Face repository into S3/MinIO, runs GGUF conversion, and exposes sizing metadata:
apiVersion: koldun.gorizond.io/v1
kind: Model
metadata:
name: mistral-convert
namespace: default
spec:
sourceUrl: https://huggingface.co/mistralai/Mistral-7B-v0.3
localPath: s3://models/mistral-7b-v0-3
objectStorage:
endpoint: http://minio.default:32090
bucketForSource: models
bucketForConvert: models-converted
secretRef:
name: minio-creds
download:
image: python:3.10
memory: 2Gi
chunkMaxMiB: 256
concurrency: 6
huggingFaceTokenSecretRef:
name: hf-token
conversion:
converterVersion: v0.16.2
image: python:3.10
memory: 8Gi
convertWeights: q40
outputPath: s3://models-converted/mistral-7b-v0-3
toolsImage: alpine:3.18
pipProxy: http://dragonfly.default:4001- Downloader and converter Jobs mount the S3 buckets via CSI and publish size data to
status.conversionSizeBytes. - Set
download.huggingFaceTokenSecretRefonly for private repositories; the secret must providetoken. - Add the annotation
koldun.gorizond.io/force-size-rerunto trigger a sizing rerun when artifacts change.
Ingress resources let the operator render the ingress/backend bundle and publish NATS details for dependent Dllama objects:
apiVersion: koldun.gorizond.io/v1
kind: Ingress
metadata:
name: public-backend
namespace: default
spec:
backend:
image: ghcr.io/gorizond/koldun:latest
rootImage: ghcr.io/gorizond/koldun:latest
workerImage: ghcr.io/gorizond/koldun:latest
dispatcherImage: ghcr.io/gorizond/koldun:latest
replicaPower: 2
nats:
url: nats://koldun:k0ldun@nats.default:4222
kvBucket: koldun_ttl
modelsBucket: koldun_models
tokensBucket: koldun_tokens
modelPrefix: model/
tokenPrefix: token/
sessionScaling:
minDllamas: 1
maxDllamas: 4
scaleUpBacklog: 2
scaleDownIdleSeconds: 120
conversationTTL: 10m
responseTimeout: 2m
service:
port: 8082
route:
host: koldun.localtest.me
path: /
ingressClassName: traefik- The controller produces the backend Deployment (
--mode=ingress), Service, and Kubernetes Ingress automatically. - Set
spec.backend.extraArgsfor advanced flags, orspec.backend.hashSecretto enable HMAC hashing. - Use
spec.backend.queue.stateStreamto override dispatcher heartbeat subjects; when it contains dots the controller reuses it directly for--dispatcher-state-prefixso Deployments and Helm/k8s manifests stay in sync. - Choosing
spec.service.type: LoadBalanceror providing TLS annotations maps directly to the rendered Kubernetes Ingress.
- Follow
AGENTS.mdfor contributor expectations, code layout, testing, and security conventions. - Run
make helpto discover the canonical shortcuts (test,controllers-smoke,compose-test,compose-update-baseline) and reminders about compose coverage maintenance. Whenever the merged compose coverage exceeds the tracked value inanalytics/compose_coverage_baseline.json, rerunmake compose-update-baseline; the helper refreshes the JSON with the new percentage, timestamp, and commit hash so CI enforces the higher bar automatically.
For full end-to-end testing with real CRDs, use the Helm chart in Rancher Desktop. This setup provides a production-like environment with NATS JetStream, MinIO, and the complete operator stack.
# 1. Switch to Rancher Desktop context
kubectl config use-context rancher-desktop
# 2. Create namespace
kubectl create namespace koldun
# 3. Build operator image
docker build -t koldun:dev .
# 4. Update Helm dependencies
make helm-deps
# 5. Install with local values
helm install koldun charts/koldun/ -n koldun -f values-dev.yaml --wait
# 6. Verify all components
kubectl get pods -n koldun
# Expected: operator, nats, minio all RunningThe following flow has been validated with Qwen3 0.6B on ARM64:
HTTP Request → Ingress Backend → NATS → Dispatcher → Dllama (Root+Workers) → LLM Response
1. Create MinIO credentials secret:
apiVersion: v1
kind: Secret
metadata:
name: minio-creds
namespace: koldun
type: Opaque
stringData:
accessKey: minioadmin
secretKey: minioadmin2. Create Model CR (TinyLlama example):
apiVersion: koldun.gorizond.io/v1
kind: Model
metadata:
name: tinyllama
namespace: koldun
spec:
sourceUrl: https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0
localPath: /models/tinyllama
objectStorage:
endpoint: http://koldun-minio:9000
bucketForSource: koldun-models
bucketForConvert: koldun-models
secretRef:
name: minio-creds
launchOptions:
- "--buffer-float-type"
- "q80"3. Create Ingress CR:
apiVersion: koldun.gorizond.io/v1
kind: Ingress
metadata:
name: tinyllama-ingress
namespace: koldun
spec:
backend:
image: koldun:dev
rootImage: koldun:dev
workerImage: koldun:dev
dispatcherImage: koldun:dev
replicaPower: 0 # 2^0 = 1 worker (stable on ARM64)
nats:
url: nats://koldun-nats:4222
kvBucket: koldun_ttl
modelsBucket: koldun_models
tokensBucket: koldun_tokens
modelPrefix: model/
tokenPrefix: token/
conversationTTL: 10m
responseTimeout: 2m
service:
port: 8082
route:
host: tinyllama.local4. Test the API:
# Port-forward the backend
kubectl port-forward svc/tinyllama-ingress-backend 8082:8082 -n koldun
# List models
curl http://localhost:8082/v1/models
# Chat completion
curl -X POST http://localhost:8082/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "koldun/tinyllama",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 50
}'# Rebuild after code changes
docker build -t koldun:dev .
# Upgrade Helm release
helm upgrade koldun charts/koldun/ -n koldun -f values-dev.yaml --wait
# Watch operator logs (may timeout due to Rancher Desktop TLS issue)
kubectl logs -f deployment/koldun -n koldun
# Check health
kubectl port-forward deployment/koldun 8080:8080 -n koldun &
curl http://localhost:8080/healthz # "ok"
curl http://localhost:8080/readyz # "ready"- kubectl logs timeout: Rancher Desktop VM has TLS handshake issues; use
kubectl port-forwardorkubectl execinstead - Multi-worker stability: 1 worker is stable; 2+ workers may have race conditions on ARM64 Lima VM
- No persistence: MinIO and NATS use memory storage (data lost on restart)
- Resource constraints: Lima VM may need memory increase for large models
For stable multi-worker distributed inference on Apple Silicon (M1/M2/M3), Rosetta and VZ (Virtualization.framework) are mandatory:
Rancher Desktop Settings:
- Open Rancher Desktop Preferences
- Go to Virtual Machine section
- Enable VZ (Virtualization.framework) instead of QEMU
- Enable Rosetta support for x86_64 emulation
- Restart Rancher Desktop
Why this matters:
- Lima VM with VZ + Rosetta provides better CPU instruction compatibility
- dllama uses advanced SIMD instructions (NEON, DOTPROD) that require proper emulation
- Without Rosetta + VZ, you may see Exit Code 139 (SIGSEGV) or Exit Code 133 (SIGILL)
- 3-worker distributed inference is stable with Rosetta + VZ enabled
Validation:
# Check Rosetta support
kubectl exec <any-pod> -- uname -m
# Should return: aarch64 or x86_64 (with Rosetta)
# Check CSI S3 (benefits from VZ)
kubectl get pods -n koldun | grep csi-s3
# Should show Running (not CrashLoopBackOff)CRITICAL: CPU-based LLM inference is EXTREMELY slow!
- Expect 2-5 minutes per token for models like Qwen3 0.6B on ARM64 Lima VM
- Never send multiple concurrent requests to the same Dllama instance
- Use NATS queues for proper request management (they handle backpressure)
- Monitor system load before sending requests
Pre-request checks:
# 1. Check NATS backlog (should be empty or low)
kubectl exec koldun-nats-0 -c nats -n koldun -- nats stream info
# 2. Check LLM sidecar status
kubectl logs <root-pod> -c llm -n koldun --tail=20
# 3. Verify no active requests (dispatcher logs)
kubectl logs <dispatcher-pod> -n koldun --tail=10Best practices:
- Send one request at a time and wait for completion
- Set realistic
max_tokens(10-50 for testing, not 1000+) - Use smaller models (TinyLlama 1.1B vs Qwen3 0.6B) for faster iteration
- Consider x86_64 production cluster with GPU for real workloads
Health Check Tolerance (since v0.1.0):
- LLM sidecar health checks now tolerate slow CPU inference:
- Check interval:
60s(previously 15s) - Failure threshold:
10(previously 4) - Grace period: ~10 minutes before pod restart
- Check interval:
- This prevents premature evictions when dllama-api blocks during inference
- See
pkg/servers/llm/server.gofor configuration details
image:
repository: koldun
tag: dev
pullPolicy: Never
nats:
enabled: true
config:
jetstream:
enabled: true
fileStore:
pvc:
size: 1Gi
minio:
enabled: true
mode: standalone
persistence:
enabled: false
operator:
conversation:
natsUrl: "nats://koldun-nats:4222"
kvBucket: "koldun_ttl"
registry:
modelsBucket: "koldun_models"
tokensBucket: "koldun_tokens"
csi-s3:
enabled: true
storageClass:
endpoint: "http://koldun-minio:9000"
accessKey: "minioadmin"
secretKey: "minioadmin"Koldun includes an automated E2E test workflow that validates the complete stack on every push/PR. This workflow tests the critical path: CRD installation → NATS/MinIO setup → Model creation → Ingress backend → resource reconciliation.
The E2E test (.github/workflows/e2e-test.yaml) runs on Ubuntu with k3d and validates:
- CRD installation with split Helm install (avoids rate limiting)
- NATS JetStream enablement
- MinIO object storage setup
- Model CR with pre-converted artifacts
- Ingress CR creation and backend deployment
- Session/Dllama resource auto-creation
- Operator NATS registry synchronization
Key Features:
- Fast execution (~7 minutes)
- Split CRD installation (separate from Helm chart)
- Pre-converted model support (skips download/conversion)
- Optional CPU inference test (
SKIP_INFERENCE=trueby default)
- Cluster Setup: Creates k3d cluster with k3s v1.28.5
- Infrastructure: Installs NATS with JetStream + MinIO manually (not via Helm deps)
- CRD Installation: Applies CRDs separately with Helm annotations
- Operator Install: Helm install with
--skip-crdsflag (faster, fewer API calls) - Resource Creation: Creates test Model CR with
preConverted: trueand Ingress CR - Validation: Runs
hack/test-e2e.shto verify resource reconciliation
You can run the same E2E workflow locally with k3d:
# 1. Create k3d cluster
k3d cluster create koldun-e2e \
--image rancher/k3s:v1.28.5-k3s1 \
--wait \
--timeout 3m
# 2. Build and import operator image
docker build -t koldun:test .
k3d image import koldun:test -c koldun-e2e
# 3. Install NATS with JetStream
helm repo add nats https://nats-io.github.io/k8s/helm/charts/
kubectl create namespace koldun
helm install koldun-nats nats/nats \
--namespace koldun \
--set config.jetstream.enabled=true \
--set config.jetstream.memoryStore.enabled=true \
--set config.jetstream.memoryStore.maxSize=1Gi \
--wait
# 4. Install MinIO
kubectl apply -n koldun -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: koldun-minio
spec:
selector:
matchLabels:
app: minio
template:
metadata:
labels:
app: minio
spec:
containers:
- name: minio
image: minio/minio:RELEASE.2024-01-01T16-36-33Z
args: [server, /data]
env:
- name: MINIO_ROOT_USER
value: minioadmin
- name: MINIO_ROOT_PASSWORD
value: minioadmin
ports:
- containerPort: 9000
---
apiVersion: v1
kind: Service
metadata:
name: koldun-minio
spec:
selector:
app: minio
ports:
- port: 9000
EOF
kubectl wait --for=condition=available deployment/koldun-minio -n koldun --timeout=2m
# 5. Create MinIO bucket
kubectl run minio-mc -n koldun --rm -i --restart=Never --image=minio/mc -- \
/bin/sh -c "mc alias set local http://koldun-minio:9000 minioadmin minioadmin && mc mb --ignore-existing local/koldun-models"
# 6. Install Koldun operator (split CRD install)
cd charts/koldun/
helm dependency build
# Apply CRDs first
for crd in templates/crd/bases/*.yaml; do
kubectl apply -f "$crd"
kubectl annotate --overwrite -f "$crd" \
meta.helm.sh/release-name=koldun \
meta.helm.sh/release-namespace=koldun
kubectl label --overwrite -f "$crd" app.kubernetes.io/managed-by=Helm
done
# Create values for E2E
cat > /tmp/values-e2e.yaml <<EOF
image:
repository: koldun
tag: test
pullPolicy: Never
operator:
conversation:
natsUrl: "nats://koldun-nats:4222"
kvBucket: koldun_ttl
registry:
modelsBucket: koldun_models
tokensBucket: koldun_tokens
nats:
enabled: false
minio:
enabled: false
csi-s3:
enabled: false
EOF
# Install operator without CRDs
helm upgrade --install koldun . \
-n koldun \
-f /tmp/values-e2e.yaml \
--skip-crds \
--wait
# 7. Create test resources
kubectl apply -n koldun -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-model-pvc
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 1Gi
---
apiVersion: koldun.gorizond.io/v1
kind: Model
metadata:
name: test-model
spec:
sourceUrl: https://example.com/test-model.bin
localPath: test-model
preConverted: true
preConvertedPVCName: test-model-pvc
preConvertedSizeBytes: 1000000
objectStorage:
endpoint: http://koldun-minio:9000
bucketForSource: koldun-models
bucketForConvert: koldun-models
secretRef:
name: minio-credentials
---
apiVersion: koldun.gorizond.io/v1
kind: Ingress
metadata:
name: test-ingress
spec:
backend:
image: koldun:test
imagePullPolicy: Never
allowAnonymous: true
conversationTTL: 5m
replicaPower: 1
nats:
url: "nats://koldun-nats:4222"
kvBucket: koldun_ttl
modelsBucket: koldun_models
tokensBucket: koldun_tokens
service:
port: 8082
EOF
# 8. Run E2E validation script
NAMESPACE=koldun \
MODEL_NAME=test-model \
INGRESS_NAME=test-ingress \
SKIP_INFERENCE=true \
./hack/test-e2e.sh
# 9. Cleanup
k3d cluster delete koldun-e2eThe E2E workflow uses preConverted: true to skip model download/conversion. This requires:
- PVC with model artifacts: Create a PVC containing the converted model files
- Model CR configuration:
spec: preConverted: true preConvertedPVCName: my-model-pvc preConvertedSizeBytes: 1000000 # Actual model size preConvertedSizeHuman: "1 MB" # Optional human-readable size
- Launch options: Specify model and tokenizer paths:
spec: launchOptions: - "--model" - "/model/model.m" - "--tokenizer" - "/model/tokenizer.t"
When to use pre-converted models:
- CI/CD pipelines (faster tests)
- Models already converted outside Koldun
- Testing without Hugging Face access
- Airgapped environments
The workflow passed successfully (run 19503997082) with these validations:
- ✅ All CRDs installed without errors
- ✅ NATS JetStream enabled and healthy
- ✅ MinIO bucket created and accessible
- ✅ Model CR reaches
Readycondition (pre-converted) - ✅ Ingress CR creates backend Deployment
- ✅ Backend pod starts and reports healthy
- ✅ Operator synchronizes NATS configuration
- ✅ Session/Dllama resources auto-created
- ✅ No crash loops or reconciliation errors
CRD installation fails with "too many requests":
- Use split install: apply CRDs first, then
helm install --skip-crds - The workflow already implements this fix
Backend pod crashes with "nats: no responders available":
- Ensure JetStream is enabled in NATS Helm values:
config: jetstream: enabled: true
Operator pod logs "registry sync disabled":
- Check
operator.conversation.natsUrlandoperator.registry.*in Helm values - Must match the chart schema (not the old
backend.*path)
Model CR stuck in Pending:
- For E2E tests, use
preConverted: truewith a stub PVC - For real models, ensure MinIO credentials and network connectivity
E2E script timeout:
- Increase
TIMEOUTenv var (default: 120 seconds) - Check
kubectl get events -n koldunfor reconciliation errors
The E2E workflow runs on:
- Every push to any branch (if paths match)
- Pull requests targeting
main - Manual dispatch via GitHub Actions UI
Workflow dispatch options:
skip_inference: Skip slow CPU inference test (default:true)
Example manual run:
gh workflow run e2e-test.yaml \
--ref optimize \
-f skip_inference=trueOnce E2E tests pass, you can:
- Add real model examples to
k8s/examples/ - Enable CPU inference test (
SKIP_INFERENCE=false) - Add integration tests for dispatcher state management
- Document troubleshooting steps in this README
- Create production-ready Helm values examples
Run the NATS + MinIO + single-node k3s stack whenever you need a reproducible environment for ingress/dispatcher tests without depending on host networking quirks.
# from repo root
export COMPOSE_FILE=docker-compose.test.yml
docker compose up -d
# kubeconfig appears under hack/localstack/kubeconfig/kubeconfig
export KUBECONFIG=$PWD/hack/localstack/kubeconfig/kubeconfig
# sample env vars for go test ./pkg/servers/ingress
export KOLDUN_NATS_URL="nats://koldun:koldun@127.0.0.1:4222"
export KOLDUN_DISPATCHER_NATS_URL="$KOLDUN_NATS_URL"
export KOLDUN_MINIO_ENDPOINT="http://127.0.0.1:9000"
export KOLDUN_MINIO_ACCESS_KEY=minio
export KOLDUN_MINIO_SECRET_KEY=minio123Bring the stack down with docker compose down -v when finished. See hack/localstack/README.md for full details on the services, health checks, and bucket/bootstrap logic.
- Local:
make compose-testspins the stack up, waits for NATS/MinIO, and runsgo testfor bothpkg/servers/ingressandpkg/servers/dispatcheragainst the same compose JetStream before tearing everything down. The target exportsKOLDUN_NATS_URL(defaultnats://koldun:koldun@127.0.0.1:4222) and mirrors it intoKOLDUN_DISPATCHER_NATS_URL, so dispatcher helpers automatically point at the compose stack. Each run emitscompose.coverprofile(merged ingress+dispatcher coverage) andartifacts/compose-logs.txtwith the fulldocker compose logsdump for easy upload/debugging. - CI:
.github/workflows/compose-ingress.yamlmirrors the same workflow on GitHub Actions so every PR touching ingress, dispatcher, or the compose stack runs the end-to-end tests with real JetStream/MinIO. The job shells out tomake compose-test, runsgo tool cover -func compose.coverprofile, publishes the total into the job summary (with a direct link to the uploadedcompose.coverage.txt), highlights any coverage increase with a call-to-action, and fails if coverage drops below the baseline recorded inanalytics/compose_coverage_baseline.jsonor if a PR raises coverage without updating that baseline. Artifactscompose.coverprofile,compose.coverage.txt, andartifacts/compose-logs.txtare uploaded on every run for offline inspection. - Coverage helpers: after any local compose run, execute
go tool cover -func compose.coverprofile | tail -n 1to inspect the “total” line and decide whether the baseline file should be updated. When you intentionally raise coverage, runmake compose-update-baseline(wrapshack/update-compose-coverage-baseline.sh $(COMPOSE_TEST_COVERPROFILE) $(COMPOSE_TEST_BASELINE)) — the helper records the new total, UTC timestamp, and current commit hash inanalytics/compose_coverage_baseline.jsonso CI enforces the higher target automatically. - Keep the stack running: set
COMPOSE_TEST_KEEP_STACK=1 make compose-testto skip the automatic teardown phase for interactive debugging (logs are still captured). Clean up manually withmake compose-test-downonce you are finished poking at the containers. - Dispatcher: when running dispatcher tests manually, ensure the compose stack is up and run
export KOLDUN_DISPATCHER_NATS_URL=$KOLDUN_NATS_URL(direnv users get this automatically via.envrc). Then executego test ./pkg/servers/dispatcher -cover -count=1to validate backlog/retry flows without relying on loopback sockets. - Shared helpers belong in
pkg/controllers/common.go; resource-specific logic lives in dedicated files (root.go,worker.go, etc.). - Prefer Go table-driven tests and mocks from
go.uber.org/mockfor JetStream/Kubernetes clients. - Avoid committing secrets; store NATS credentials and hash secrets in Kubernetes Secrets labelled
koldun.gorizond.io/token.
ErrConnectionRefused/nats: no servers availableduring dispatcher tests: ensuredocker compose upis running and bothKOLDUN_NATS_URLandKOLDUN_DISPATCHER_NATS_URLpoint to the compose endpoint (nats://koldun:koldun@127.0.0.1:4222).make compose-testand.envrcalready wire this up; for manual runs export the variables before invokinggo test.- Shared helpers belong in
pkg/controllers/common.go; resource-specific logic lives in dedicated files (root.go,worker.go, etc.). - Prefer Go table-driven tests and mocks from
go.uber.org/mockfor JetStream/Kubernetes clients. - Avoid committing secrets; store NATS credentials and hash secrets in Kubernetes Secrets labelled
koldun.gorizond.io/token.
Controllers rely on envtest binaries (kube-apiserver, etcd) when running the integration test in pkg/controllers/dllama_reconcile_envtest_test.go. Install the assets once and export KUBEBUILDER_ASSETS before running the suite:
go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest
# Downloads the Kubernetes stack compatible with controller-runtime v0.20.4 and prints the export lines
eval "$(setup-envtest use -p env --bin-dir ./bin/envtest 1.32.x!)"
# Persist for new shells / CI jobs (optional but recommended)
export KUBEBUILDER_ASSETS="$(./hack/print-kubebuilder-assets.sh)"
ls "$KUBEBUILDER_ASSETS" # sanity-check kube-apiserver/etcd are present
# Verify the integration test; it will skip with a helpful message if assets are missing
go test ./pkg/controllers -run TestDllamaReconciliationCreatesRootAndWorker -count=1If you want the same two-step sequence our onboarding docs and CI jobs follow, run:
make envtest-preflight
export KUBEBUILDER_ASSETS="$(./hack/print-kubebuilder-assets.sh)"-
make envtest-preflight export KUBEBUILDER_ASSETS="$(./hack/print-kubebuilder-assets.sh)" /usr/bin/time -p make controllers-smoke
Example log from the latest macOS arm64 run (cached envtest + module cache):
Running controller tests... Using KUBEBUILDER_ASSETS=/Users/negash/.agor/worktrees/gorizond/koldun/optimize/bin/envtest/k8s/1.32.0-darwin-arm64 ok github.com/gorizond/koldun/pkg/controllers 39.729s ✓ All controller tests passed real 44.76 user 8.96 sys 4.39The helper prints the resolved
KUBEBUILDER_ASSETSpath, and wrapping the smoke test with/usr/bin/time -pgives a wall-clock baseline you can compare against future runs to spot missing caches or stalled envtest downloads. Prior to the Session 52 optimization, the suite took ~60 s becauseTestConversationReconcilerMaintainsRecoveryTimeAcrossOutagesbootstrapped extra KV data and repeated cleanup; that section is now leaner (no pre-bootstrap, one reconnection loop per outage), hence the win. Keep the older ~60 s number in mind if you need to bisect regressions, but use the table below for current reference points on both macOS and Linux runners (cold = brand new checkout, cached = warmed envtest + module cache):Runner Envtest state /usr/bin/time -p make controllers-smoke(real)go test ./pkg/controllersdurationNotes macOS 15.1 (Apple Silicon) cached (envtest + module cache restored) 44.76 s real (user 8.96 s / sys 4.39 s) 39.73 s Same workstation used for Sessions 52–53 macOS 15.1 (Apple Silicon) cold (fresh checkout, no caches) 262.58 s real (user 120.46 s / sys 30.40 s) 45.96 s Includes first-run toolchain/module download; preceding make envtest-preflightadds 53.58 s
| Linux (golang:1.22-bookworm container on the same host) | cached | 44.04 s real (user 6.04 s / sys 2.73 s) | 41.13 s | GOTOOLCHAIN=go1.25, ./hack/print-kubebuilder-assets.sh auto-selects bin/envtest/k8s/1.32.0-linux-arm64 based on uname -s |
| Linux (golang:1.22-bookworm container) | cold | 225.71 s real (user 117.14 s / sys 31.85 s) | 38.51 s | Includes downloading the Go toolchain + modules; envtest assets populated via setup-envtest use and auto-detected via ./hack/print-kubebuilder-assets.sh |
For troubleshooting slow or failing smoke tests, consult the Envtest FAQ in docs/ci-envtest.md; it now points back to this table so you can compare your runner against the cached/cold baselines.
make envtest-preflightwraps thesetup-envtest useinvocation, validates that bothkube-apiserverandetcdbinaries exist, and reprints theKUBEBUILDER_ASSETSexport line. Use it after toolchain upgrades or when bootstrapping CI runners..envrcnow exportsKUBEBUILDER_ASSETS=$(./hack/print-kubebuilder-assets.sh); rundirenv allow(or copy the line into your shell profile) so controller tests discover the assets automatically.- Capture the
KUBEBUILDER_ASSETSpath printed bysetup-envtest use(typically./bin/envtest/k8s/1.32.0-<os>-<arch>) or run./hack/print-kubebuilder-assets.shto auto-detect it for local shells and CI pipelines. - Cache the
./bin/envtestdirectory in CI runners to avoid downloading the binaries on every job; re-runsetup-envtest useonly when bumping controller-runtime. - Once the cache exists, set
KOLD_SKIP_ENVTEST_DOWNLOAD=1in CI (and optionally locally) soensureKubebuilderAssets()fails fast if the binaries disappear instead of spending ~10 seconds trying to auto-download them. - GitHub Actions runs these smoke tests in
.github/workflows/ci-build.yamlvia thecontrollers-envtestjob. The job restores thebin/envtestcache, installssetup-envtest, executesmake envtest-preflight, and blocks the Docker build job untilgo test ./pkg/controllers -count=1 -timeout=10mpasses. - Для любых CI раннеров (включая self-hosted) следуйте чек-листу в
docs/ci-envtest.md: восстановите кешbin/envtest, выполнитеmake envtest-preflight, экспортируйтеKUBEBUILDER_ASSETS="$(./hack/print-kubebuilder-assets.sh)", а затем прогонитеmake controllers-smoke. - Add the
export KUBEBUILDER_ASSETS=…line to your shell profile (e.g..envrc,.zshrc) sogo testpicks it up without re-runningsetup-envtest. - The helper in
pkg/controllers/envtest_suite_test.goauto-discoversKUBEBUILDER_ASSETS; when the binaries are absent the test suite now exits early with an explicit instruction instead of noisy control-plane failures.
- Generate a focused profile:
go test ./pkg/controllers -coverprofile=controllers.cover - Inspect watchers and sizing helpers:
go tool cover -func=controllers.cover | grep -E 'root.go|dllama.go|model_jobs.go' - Current reference (2025-11-02 19:30): controllers pkg coverage 77.9%;
worker.ensureStatefulSet88.0% (replica/memory planning paths covered),worker.ensureStatus94.1% (ready + observedGeneration branches), root and worker watchers remain 100%,persistModelAnnotationerror logging exercised via gomock,ensureSizingJob87.9% (delete/apply errors mocked). - Remove the temporary profile when finished (
rm controllers.cover) to keep the workspace clean.
| Purpose | Command |
|---|---|
| Format | go fmt ./... && gofmt -w . |
| Unit tests | go test ./... (append -race for data race checks) |
| Controllers smoke | make controllers-smoke (go test ./pkg/controllers -count=1 -timeout=10m) |
| Build binary | go build ./cmd/operator |
| Run operator | go run ./cmd/operator --mode=operator |
| Build/push image | skaffold build |
- Labels on Secrets (
koldun.gorizond.io/token=true) trigger token mirroring into the JetStream registry bucket; the backend rejects disabled tokens (stringData.disabled). - Set
backend-hash-secretto enable HMAC-SHA256 conversation hashing; leave empty for plain SHA-256. - The operator ensures S3 PV/PVC resources exist when
Model.spec.objectStorageis configured; disable automatic bucket creation with--operator-disable-bucket-ensurewhen managing buckets manually. - Update the Helm chart, Kubernetes manifests, and Dockerfile together when changing binary flags or images to avoid drift.
- Sample CRs:
k8s/examples/*.yaml(models, dllama topologies, ingress definitions). - Token tooling lives in
pkg/tokens; registry helpers inpkg/registryshow how JetStream buckets are structured. - File an issue or PR with validation steps (
go test ./..., Helm installation logs, kube events) to document behavioural changes.
I dedicate this repository to my grandfather, Negashev Vyacheslav Ivanovich