From cbc649bb17c5142e1087e165a3b8bb8278c9e67d Mon Sep 17 00:00:00 2001 From: Alex Kennedy Date: Mon, 2 Mar 2026 15:00:19 -0800 Subject: [PATCH 1/2] docs: add CLAUDE.md with project guidance for Claude Code Provides Claude Code with project context including development commands, architecture overview, key patterns, and PR requirements. Co-Authored-By: Claude Opus 4.6 (1M context) --- CLAUDE.md | 158 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..ecd6457 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,158 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +Oz RBAC Controller is a Kubernetes operator that provides short-term elevated RBAC privileges to end-users through the native Kubernetes RBAC system. It manages four main Custom Resource Definitions (CRDs): + +- **ExecAccessTemplate** / **ExecAccessRequest**: Grant temporary `kubectl exec` access to specific pods +- **PodAccessTemplate** / **PodAccessRequest**: Create temporary dedicated pods for shell access (not in traffic path) + +The operator creates short-lived `Role`, `RoleBinding`, and `Pod` resources on-demand based on these templates and requests. + +## Development Commands + +### Building and Running + +```bash +# Build the manager binary +make build + +# Build with goreleaser (creates binaries in dist/) +make release + +# Build Docker image +make docker-build IMG= + +# Load Docker image into KIND cluster +make docker-load + +# Run the manager locally (without Docker) +make run +``` + +### Testing + +```bash +# Run unit tests (all tests except e2e) +make test + +# View test coverage report in terminal +make cover + +# View test coverage report in browser +make coverhtml + +# Run end-to-end tests (requires KIND cluster with cert-manager) +make test-e2e +``` + +The test suite uses Ginkgo/Gomega. Individual test files can be run with: +```bash +go test ./internal/path/to/package -v -ginkgo.v +``` + +### Linting and Formatting + +```bash +# Run all linters (revive + golangci-lint) +make lint + +# Format code (uses gofumpt + golines) +make fmt +``` + +### Kubernetes Development + +```bash +# Create a KIND cluster for development +kind create cluster + +# Install cert-manager (required for webhooks) +make cert-manager + +# Generate CRD manifests and RBAC +make manifests + +# Generate DeepCopy implementations +make generate + +# Install CRDs into cluster +make install + +# Deploy controller to cluster +make deploy + +# Uninstall CRDs +make uninstall + +# Remove controller from cluster +make undeploy +``` + +### Documentation + +```bash +# Generate API documentation (updates API.md) +make godocs + +# Generate Helm chart documentation +make helm-docs +``` + +## Architecture + +### Code Organization + +- **`internal/api/v1alpha1/`**: CRD definitions and webhook implementations + - Each CRD has `*_types.go` (API schema) and `*_webhook.go` (admission webhooks) + - Webhooks use mutating webhooks to inject user context and validating webhooks for logging + +- **`internal/controllers/`**: Reconciliation controllers + - Template controllers validate that templates are properly configured + - Request controllers orchestrate resource creation via builders + - Controllers delegate heavy lifting to builders to keep reconcile logic clean + +- **`internal/builders/`**: Resource construction logic + - `execaccessbuilder/`: Creates Role/RoleBinding for exec access + - `podaccessbuilder/`: Creates Pod/Role/RoleBinding for pod access + - `utils/`: Shared utilities for resource generation + +- **`internal/cmd/ozctl/`**: End-user CLI tool for creating and managing access requests + +- **`internal/testing/e2e/`**: End-to-end integration tests + +### Key Patterns + +1. **Reconciler → Builder separation**: Controllers call builder structs to create/verify resources. This keeps reconcile functions short and testable. + +2. **Status conditions**: All CRDs use `.status.conditions[]` to track state (e.g., `TargetRefExists`, `AccessDurationsValid`). Update these when validation fails. + +3. **Owner references**: Created resources (Roles, RoleBindings, Pods) have owner references set to the AccessRequest, enabling automatic cleanup via Kubernetes garbage collection. + +4. **Admission webhooks**: Mutating webhooks populate user context (from admission.Request), validating webhooks log actions and enforce policies. + +5. **Test structure**: Uses Ginkgo/Gomega with `suite_test.go` files for setup. Tests use envtest for realistic K8s API interactions. + +## Build System + +- Uses **Makefile** for standard targets and **Custom.mk** for project-specific extensions +- Uses **goreleaser** for multi-platform builds and releases +- Docker images are built for linux/arm64, linux/amd64, linux/s390x, linux/ppc64le +- The `main.go` files in `cmd/` are thin wrappers that call `internal/cmd/*/Main()` + +## PR Requirements + +This repository uses **conventional commits** for PR titles. Valid types: +- `build`: Build system or dependency changes +- `chore`: Maintenance tasks +- `docs`: Documentation changes +- `feat`: New features +- `fix`: Bug fixes + +Scope is optional. PR titles are validated by the `pull-request-lint.yaml` workflow. + +## Helm Chart + +The Helm chart is maintained in `charts/oz/` and deployed separately via the `oz-charts` repository. Local chart testing uses `ct` (chart-testing) with the config in `ct.yaml`. From a26d0b74fa60400ded94e24d091c520886cfd6d4 Mon Sep 17 00:00:00 2001 From: Alex Kennedy Date: Mon, 2 Mar 2026 16:01:46 -0800 Subject: [PATCH 2/2] docs: enrich CLAUDE.md with deep architectural guidance Add interface hierarchy, reconciliation flows, IBuilder contract, webhook system details, status conditions, owner reference chain, and key code patterns discovered from full codebase analysis. Co-Authored-By: Claude Opus 4.6 (1M context) --- CLAUDE.md | 141 ++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 117 insertions(+), 24 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index ecd6457..fef5577 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,12 +4,15 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project Overview -Oz RBAC Controller is a Kubernetes operator that provides short-term elevated RBAC privileges to end-users through the native Kubernetes RBAC system. It manages four main Custom Resource Definitions (CRDs): +Oz RBAC Controller is a Kubernetes operator (controller-runtime based) that provides short-term elevated RBAC privileges to end-users through the native Kubernetes RBAC system. It manages four Custom Resource Definitions (CRDs) under the API group `crds.wizardofoz.co/v1alpha1`: -- **ExecAccessTemplate** / **ExecAccessRequest**: Grant temporary `kubectl exec` access to specific pods -- **PodAccessTemplate** / **PodAccessRequest**: Create temporary dedicated pods for shell access (not in traffic path) +- **ExecAccessTemplate** / **ExecAccessRequest**: Grant temporary `kubectl exec` access to pods managed by an existing controller (Deployment, DaemonSet, StatefulSet, or Argo Rollout) +- **PodAccessTemplate** / **PodAccessRequest**: Create temporary dedicated pods (not in traffic path) for shell access, cloned from a controller's PodSpec with optional mutations -The operator creates short-lived `Role`, `RoleBinding`, and `Pod` resources on-demand based on these templates and requests. +The operator creates short-lived `Role`, `RoleBinding`, and `Pod` resources on-demand. When access expires, the AccessRequest is deleted, cascading via owner references to clean up all created resources. + +**Module path**: `github.com/diranged/oz` +**Container image**: `ghcr.io/diranged/oz` ## Development Commands @@ -105,35 +108,116 @@ make helm-docs ### Code Organization -- **`internal/api/v1alpha1/`**: CRD definitions and webhook implementations - - Each CRD has `*_types.go` (API schema) and `*_webhook.go` (admission webhooks) - - Webhooks use mutating webhooks to inject user context and validating webhooks for logging +- **`internal/api/v1alpha1/`**: CRD type definitions (`*_types.go`), webhook implementations (`*_webhook.go`), shared interfaces, status structs, and condition types +- **`internal/controllers/`**: Generic reconcilers + - `templatecontroller/`: `TemplateReconciler` - validates templates (target ref exists, durations valid) + - `requestcontroller/`: `RequestReconciler` - orchestrates access resource creation via builders, handles expiry + - `podwatcher/`: ValidatingWebhook that audits pod exec/attach events + - `internal/status/`: Condition setters and status update helpers + - `internal/ctrlrequeue/`: Requeue helper functions +- **`internal/builders/`**: Resource construction logic implementing `IBuilder` + - `execaccessbuilder/`: Creates Role/RoleBinding for exec access to existing pods + - `podaccessbuilder/`: Creates Pod/Role/RoleBinding for dedicated pod access + - `utils/`: Shared utilities (role creation, pod creation, duration logic, owner refs, access command templating) +- **`internal/webhook/`**: Custom webhook framework extending controller-runtime with contextual admission request support +- **`internal/cmd/manager/`**: Manager entrypoint (registers schemes, reconcilers, webhooks, field indexers) +- **`internal/cmd/ozctl/`**: Cobra-based CLI for end-users to create access requests +- **`internal/testing/e2e/`**: End-to-end integration tests (require KIND cluster) + +### Interface Hierarchy + +The codebase is built on interfaces that enable generic reconciler logic: + +``` +ICoreResource (metav1.Object + runtime.Object + GetStatus) +├── ITemplateResource (+ GetTargetRef, GetAccessConfig) +└── IRequestResource (+ GetTemplate, GetDuration, GetUptime, GetTemplateName) + └── IPodRequestResource (+ SetPodName, GetPodName) + +ICoreStatus (IsReady, SetReady, GetConditions) +├── IRequestStatus (+ SetAccessMessage, GetAccessMessage) +└── ITemplateStatus (placeholder) +``` + +Both `ExecAccessRequest` and `PodAccessRequest` implement `IPodRequestResource`. Both template types implement `ITemplateResource`. When adding new access types, implement these interfaces and the `IBuilder` interface. + +### IBuilder Interface + +Builders abstract resource-creation logic away from the reconciler: + +```go +type IBuilder interface { + GetTemplate(ctx, client, req) (ITemplateResource, error) + GetAccessDuration(req, tmpl) (duration, decision, error) + SetRequestOwnerReference(ctx, client, req, tmpl) error + CreateAccessResources(ctx, client, req, tmpl) (string, error) + AccessResourcesAreReady(ctx, client, req, tmpl) (bool, error) +} +``` + +- **ExecAccessBuilder**: Selects a pod (random or specific via label selector), creates Role + RoleBinding scoped to that pod. `AccessResourcesAreReady()` always returns true. +- **PodAccessBuilder**: Clones PodSpec from controller, applies `PodTemplateSpecMutationConfig` mutations, creates Pod + Role + RoleBinding. `AccessResourcesAreReady()` polls pod for Running/Ready (30s timeout, 1s interval). + +### Reconciliation Flows + +**TemplateReconciler** (both template types): +1. `fetchRequestObject` - verify resource exists +2. `verifyTargetRef` - check controller target exists (unstructured client) +3. `verifyDuration` - validate defaultDuration <= maxDuration +4. `SetReadyStatus` - set ready=true if all conditions are True +5. Requeue after interval (default: 5min) + +**RequestReconciler** (both request types): +1. `fetchRequestObject` - verify resource exists +2. `verifyTemplate` - find template via builder, set owner reference (template owns request) +3. `verifyDuration` - compute effective duration, check expiry +4. `isAccessExpired` - if `ConditionAccessStillValid=False`, **delete** the request (cascading cleanup) +5. `verifyAccessResources` - delegate to builder's CreateAccessResources + AccessResourcesAreReady +6. `SetReadyStatus` - final ready state +7. Requeue after interval -- **`internal/controllers/`**: Reconciliation controllers - - Template controllers validate that templates are properly configured - - Request controllers orchestrate resource creation via builders - - Controllers delegate heavy lifting to builders to keep reconcile logic clean +### Owner Reference Chain -- **`internal/builders/`**: Resource construction logic - - `execaccessbuilder/`: Creates Role/RoleBinding for exec access - - `podaccessbuilder/`: Creates Pod/Role/RoleBinding for pod access - - `utils/`: Shared utilities for resource generation +``` +Template → owns → AccessRequest → owns → Role, RoleBinding, [Pod] +``` + +Deleting a template cascades to requests, which cascade to all access resources. + +### Status Conditions + +**Template conditions**: `TemplateDurationsValid`, `TargetRefExists` + +**Request conditions**: `TargetTemplateExists`, `AccessDurationsValid`, `AccessStillValid` (False triggers deletion), `AccessResourcesCreated`, `AccessResourcesReady`, `AccessMessage` + +`status.ready` is derived: true only when ALL conditions are `ConditionTrue`. -- **`internal/cmd/ozctl/`**: End-user CLI tool for creating and managing access requests +### Webhook System -- **`internal/testing/e2e/`**: End-to-end integration tests +Custom framework in `internal/webhook/` extends controller-runtime to pass `admission.Request` into defaulter/validator methods (enables access to `req.UserInfo`): +- `IContextuallyDefaultableObject` - `Default(req admission.Request) error` +- `IContextuallyValidatableObject` - `ValidateCreate/Update/Delete(req admission.Request)` +- `PodWatcher` at `/watch-v1-pod` - audit-only webhook for pods/exec and pods/attach events -### Key Patterns +### Key Code Patterns -1. **Reconciler → Builder separation**: Controllers call builder structs to create/verify resources. This keeps reconcile functions short and testable. +1. **Reconciler -> Builder separation**: Controllers delegate resource creation to `IBuilder` implementations, keeping reconcile logic generic and testable. -2. **Status conditions**: All CRDs use `.status.conditions[]` to track state (e.g., `TargetRefExists`, `AccessDurationsValid`). Update these when validation fails. +2. **`shouldReturn` pattern**: Request reconciler steps return `(shouldReturn bool, result, err)`. Caller checks `shouldReturn` to exit the reconcile loop early. -3. **Owner references**: Created resources (Roles, RoleBindings, Pods) have owner references set to the AccessRequest, enabling automatic cleanup via Kubernetes garbage collection. +3. **`CreateOrUpdate` for idempotency**: Roles and RoleBindings use controller-runtime's `CreateOrUpdate`. Pods use get-then-create (not CreateOrUpdate) because updating running pods is problematic. -4. **Admission webhooks**: Mutating webhooks populate user context (from admission.Request), validating webhooks log actions and enforce policies. +4. **Cached vs non-cached client**: Reconcilers use `client.Client` (cached) for normal operations and `APIReader` (non-cached) for existence checks to avoid stale data. -5. **Test structure**: Uses Ginkgo/Gomega with `suite_test.go` files for setup. Tests use envtest for realistic K8s API interactions. +5. **Field indexers**: Manager registers custom field indexers for `metadata.name` and `status.phase` on Pods, enabling efficient filtered list operations in pod selection. + +6. **Resource naming**: Generated resources use `{request-name}-{short-uid}` where short-uid is first 8 chars of the request's UID. + +7. **Label purging**: PodAccessBuilder always strips original labels from cloned pods to prevent traffic routing. Custom labels can be added via `controllerTargetMutationConfig.podLabels`. + +8. **Event filter**: Both reconcilers use `IgnoreStatusUpdatesAndDeletion()` to avoid infinite reconcile loops from status updates. + +9. **Supported controller targets** (enum-validated in CRD): `apps/v1` Deployment, DaemonSet, StatefulSet; `argoproj.io/v1alpha1` Rollout. ## Build System @@ -141,6 +225,15 @@ make helm-docs - Uses **goreleaser** for multi-platform builds and releases - Docker images are built for linux/arm64, linux/amd64, linux/s390x, linux/ppc64le - The `main.go` files in `cmd/` are thin wrappers that call `internal/cmd/*/Main()` +- CRD generation: `controller-gen` produces manifests, RBAC, webhooks, and DeepCopy implementations +- Linting: `revive` (config in `revive.toml`) + `golangci-lint` (config in `.golangci.yml`) +- Formatting: `gofumpt` + `golines` + +## Testing + +- **Unit tests**: Ginkgo v2 + Gomega with envtest (real API server, in-memory). Each package has a `suite_test.go` for setup. +- **E2E tests**: Require KIND cluster with cert-manager. Tests deploy operator and verify full template/request lifecycle including actual `kubectl exec`. +- Run `make test` for unit tests, `make test-e2e` for integration tests. ## PR Requirements @@ -155,4 +248,4 @@ Scope is optional. PR titles are validated by the `pull-request-lint.yaml` workf ## Helm Chart -The Helm chart is maintained in `charts/oz/` and deployed separately via the `oz-charts` repository. Local chart testing uses `ct` (chart-testing) with the config in `ct.yaml`. +The Helm chart is maintained in `charts/oz/` and deployed separately via the `oz-charts` repository. Chart deploys: manager Deployment, RBAC, webhook Service + configurations, cert-manager Certificate, metrics Service. Local chart testing uses `ct` (chart-testing) with the config in `ct.yaml`.