-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Summary
Migrate the kagenti-webhook mutating admission webhook from kagenti/kagenti-extensions into kagenti/kagenti-operator. Co-locating the webhook with the AgentRuntime CRD enables typed access (replacing the current unstructured client), simplifies deployment (single binary + Helm chart), and unblocks CR-driven injection where AgentRuntime CR presence controls sidecar injection.
Motivation
- The webhook already reads AgentRuntime CRs via unstructured client — co-location enables typed access
- Single operator binary simplifies deployment and RBAC
- AgentRuntime controller + webhook in the same repo enables CR-driven injection (CR presence = inject, CR deletion = stop injecting)
- Eliminates version drift between the two repos
Current State
kagenti-operator: Go 1.24 / controller-runtime v0.20.0 / k8s API v0.32.0
kagenti-webhook: Go 1.26 / controller-runtime v0.23.3 / k8s API v0.35.2
The webhook is ~5,300 lines across 3 packages:
internal/webhook/config/(680 lines) — PlatformConfig, FeatureGates, hot-reload loadersinternal/webhook/injector/(3,785 lines) — PodMutator, ContainerBuilder, VolumeBuilder, PrecedenceEvaluator, config resolution pipeline (namespace CMs + AgentRuntime overrides)internal/webhook/v1alpha1/(551 lines) — AuthBridge admission handler (rawadmission.Handler, Pod CREATE)cmd/main.go(321 lines) — CLI flags, manager setup, webhook registration
Phases
Phase 0: Upgrade operator dependencies (BLOCKER)
- Upgrade Go 1.24 → 1.26
- Upgrade controller-runtime v0.20 → v0.23
- Upgrade k8s API v0.32 → v0.35
- Verify existing controllers and AgentCard webhook still pass tests
Phase 1: Migrate config package
- Copy
internal/webhook/config/(types.go, defaults.go, feature_gates.go, feature_gate_loader.go, loader.go) - Update module path imports
- Update image refs in
defaults.goif needed (sidecar images stay inghcr.io/kagenti/kagenti-extensions/) - Tests pass
Phase 2: Migrate injector package
- Copy
internal/webhook/injector/(pod_mutator, container_builder, volume_builder, precedence, config resolution, envoy template) - Ensure
envoy.yaml.tmplstays colocated withenvoy_template.go(//go:embedconstraint) - Update module path imports
- Consider replacing unstructured AgentRuntime access with typed access (types are local now)
- Tests pass
Phase 3: Migrate webhook handler + wire into operator
- Copy
internal/webhook/v1alpha1/(authbridge_webhook.go, tests) - Wire
SetupAuthBridgeWebhookWithManager()into operator'scmd/main.go - Add CLI flags (--enable-client-registration, --webhook-cert-path, etc.)
- Fix test relative paths for new directory structure
- Tests pass
Phase 4: Infrastructure (Kustomize, Helm, CI)
- Merge webhook Kustomize manifests (RBAC, MutatingWebhookConfiguration, cert-manager)
- Merge or migrate Helm chart (
charts/kagenti-webhook/→ operator chart) - Update CI/CD workflows (image builds, GoReleaser)
- Update Dockerfile to include webhook binary
Phase 5: Cutover and deprecation
- End-to-end testing on Kind cluster
- Verify: agent workload injection, tool workload handling, feature gates, config resolution, idempotency
- Deprecate webhook in kagenti-extensions (remove code, update docs)
- Create tracking issue in kagenti-extensions for removal
- Update CLAUDE.md files in both repos
Key Migration Concerns
- Controller-runtime version gap (v0.20 → v0.23) — Phase 0 is the blocker, may have breaking changes
- Module path rewrite — all imports change from
github.com/kagenti/kagenti-extensions/kagenti-webhook/...togithub.com/kagenti/operator/... - Image defaults —
defaults.goreferencesghcr.io/kagenti/kagenti-extensions/*for sidecar images; these stay in the extensions repo //go:embed envoy.yaml.tmpl— template file must remain in the same package directory asenvoy_template.go- Test relative paths —
webhook_suite_test.gouses../../../config/paths that need updating - Helm chart consolidation — two charts need merging or the webhook chart moves over
Config Resolution Pipeline (context)
The webhook's config resolution pipeline (gated by perWorkloadConfigResolution feature flag) reads namespace ConfigMaps at admission time and merges with AgentRuntime CR overrides:
CompiledDefaults (L1)
↓ overlaid by
PlatformConfig ConfigMap (L2)
↓ provides images/ports/resources
Namespace ConfigMaps (L3): authbridge-config, spiffe-helper-config, envoy-config, authproxy-routes
↓ overlaid by
AgentRuntime CR .spec.identity + .spec.trace (L4)
↓
ResolvedConfig → ContainerBuilder → literal env vars in Pod spec
Design doc: kagenti-extensions/kagenti-webhook/docs/design-webhook-config-resolution.md
References
- kagenti-webhook source: https://github.com/kagenti/kagenti-extensions/tree/main/kagenti-webhook
- Config resolution PR: feat: Per-workload config resolution pipeline for webhook kagenti-extensions#217 (merged)
- AgentRuntime CRD: Feat: add AgentRuntime CRD types and documentation #212
- Externalize config epic: Epic: Externalize Configuration & Multi-Layer Precedence System kagenti-extensions#109