Skip to content

Epic: Migrate kagenti-webhook admission webhook into kagenti-operator #238

@cwiklik

Description

@cwiklik

Summary

Migrate the kagenti-webhook mutating admission webhook from kagenti/kagenti-extensions into kagenti/kagenti-operator. Co-locating the webhook with the AgentRuntime CRD enables typed access (replacing the current unstructured client), simplifies deployment (single binary + Helm chart), and unblocks CR-driven injection where AgentRuntime CR presence controls sidecar injection.

Motivation

  • The webhook already reads AgentRuntime CRs via unstructured client — co-location enables typed access
  • Single operator binary simplifies deployment and RBAC
  • AgentRuntime controller + webhook in the same repo enables CR-driven injection (CR presence = inject, CR deletion = stop injecting)
  • Eliminates version drift between the two repos

Current State

kagenti-operator: Go 1.24 / controller-runtime v0.20.0 / k8s API v0.32.0
kagenti-webhook: Go 1.26 / controller-runtime v0.23.3 / k8s API v0.35.2

The webhook is ~5,300 lines across 3 packages:

  • internal/webhook/config/ (680 lines) — PlatformConfig, FeatureGates, hot-reload loaders
  • internal/webhook/injector/ (3,785 lines) — PodMutator, ContainerBuilder, VolumeBuilder, PrecedenceEvaluator, config resolution pipeline (namespace CMs + AgentRuntime overrides)
  • internal/webhook/v1alpha1/ (551 lines) — AuthBridge admission handler (raw admission.Handler, Pod CREATE)
  • cmd/main.go (321 lines) — CLI flags, manager setup, webhook registration

Phases

Phase 0: Upgrade operator dependencies (BLOCKER)

  • Upgrade Go 1.24 → 1.26
  • Upgrade controller-runtime v0.20 → v0.23
  • Upgrade k8s API v0.32 → v0.35
  • Verify existing controllers and AgentCard webhook still pass tests

Phase 1: Migrate config package

  • Copy internal/webhook/config/ (types.go, defaults.go, feature_gates.go, feature_gate_loader.go, loader.go)
  • Update module path imports
  • Update image refs in defaults.go if needed (sidecar images stay in ghcr.io/kagenti/kagenti-extensions/)
  • Tests pass

Phase 2: Migrate injector package

  • Copy internal/webhook/injector/ (pod_mutator, container_builder, volume_builder, precedence, config resolution, envoy template)
  • Ensure envoy.yaml.tmpl stays colocated with envoy_template.go (//go:embed constraint)
  • Update module path imports
  • Consider replacing unstructured AgentRuntime access with typed access (types are local now)
  • Tests pass

Phase 3: Migrate webhook handler + wire into operator

  • Copy internal/webhook/v1alpha1/ (authbridge_webhook.go, tests)
  • Wire SetupAuthBridgeWebhookWithManager() into operator's cmd/main.go
  • Add CLI flags (--enable-client-registration, --webhook-cert-path, etc.)
  • Fix test relative paths for new directory structure
  • Tests pass

Phase 4: Infrastructure (Kustomize, Helm, CI)

  • Merge webhook Kustomize manifests (RBAC, MutatingWebhookConfiguration, cert-manager)
  • Merge or migrate Helm chart (charts/kagenti-webhook/ → operator chart)
  • Update CI/CD workflows (image builds, GoReleaser)
  • Update Dockerfile to include webhook binary

Phase 5: Cutover and deprecation

  • End-to-end testing on Kind cluster
  • Verify: agent workload injection, tool workload handling, feature gates, config resolution, idempotency
  • Deprecate webhook in kagenti-extensions (remove code, update docs)
  • Create tracking issue in kagenti-extensions for removal
  • Update CLAUDE.md files in both repos

Key Migration Concerns

  1. Controller-runtime version gap (v0.20 → v0.23) — Phase 0 is the blocker, may have breaking changes
  2. Module path rewrite — all imports change from github.com/kagenti/kagenti-extensions/kagenti-webhook/... to github.com/kagenti/operator/...
  3. Image defaultsdefaults.go references ghcr.io/kagenti/kagenti-extensions/* for sidecar images; these stay in the extensions repo
  4. //go:embed envoy.yaml.tmpl — template file must remain in the same package directory as envoy_template.go
  5. Test relative pathswebhook_suite_test.go uses ../../../config/ paths that need updating
  6. Helm chart consolidation — two charts need merging or the webhook chart moves over

Config Resolution Pipeline (context)

The webhook's config resolution pipeline (gated by perWorkloadConfigResolution feature flag) reads namespace ConfigMaps at admission time and merges with AgentRuntime CR overrides:

CompiledDefaults (L1)
  ↓ overlaid by
PlatformConfig ConfigMap (L2)
  ↓ provides images/ports/resources
Namespace ConfigMaps (L3): authbridge-config, spiffe-helper-config, envoy-config, authproxy-routes
  ↓ overlaid by
AgentRuntime CR .spec.identity + .spec.trace (L4)
  ↓
ResolvedConfig → ContainerBuilder → literal env vars in Pod spec

Design doc: kagenti-extensions/kagenti-webhook/docs/design-webhook-config-resolution.md

References

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions