fix(rbac): Add missing agentruntimes permissions to ClusterRole#253
fix(rbac): Add missing agentruntimes permissions to ClusterRole#253
Conversation
AgentRuntimeReconciler has been deployed in production without the necessary RBAC permissions, causing continuous permission errors in operator logs. ## Problem The operator's ServiceAccount cannot list/watch AgentRuntime CRDs: ``` agentruntimes.agent.kagenti.dev is forbidden: User "system:serviceaccount:kagenti-operator-system:controller-manager" cannot list resource "agentruntimes" in API group "agent.kagenti.dev" at the cluster scope ``` This error repeats continuously with exponential backoff, filling logs and preventing AgentRuntime reconciliation. ## Root Cause 1. AgentRuntimeReconciler is always registered (cmd/main.go:323-330) 2. Controller declares required RBAC in code annotations: ```go // +kubebuilder:rbac:groups=agent.kagenti.dev,resources=agentruntimes,verbs=get;list;watch;create;update;patch;delete // +kubebuilder:rbac:groups=agent.kagenti.dev,resources=agentruntimes/status,verbs=get;update;patch // +kubebuilder:rbac:groups=agent.kagenti.dev,resources=agentruntimes/finalizers,verbs=update ``` 3. Helm chart ClusterRole template is missing these permissions ## Solution Add agentruntimes permissions to charts/kagenti-operator/templates/rbac/role.yaml matching the kubebuilder RBAC annotations in agentruntime_controller.go. ## Impact - Fixes permission errors in operator logs - Enables AgentRuntime controller to function correctly - Allows per-workload identity/observability configuration ## Testing Deployed operator with fix in kind cluster: - Permission errors stopped immediately - AgentRuntime controller can now list/watch CRDs - No regressions in other controllers Fixes a pre-existing bug affecting all deployments. Signed-off-by: Alan Cha <Alan.cha1@ibm.com>
Full Error Logs from OperatorThese errors repeat continuously in the kagenti-operator logs (exponential backoff): How to reproduce:
Verification after fix:
|
cwiklik
left a comment
There was a problem hiding this comment.
Review Summary
Correct fix for the missing agentruntimes RBAC permissions — the added rules match the kubebuilder markers exactly and the PR description is thorough with clear root cause analysis.
Overlap with #249: PR #249 (already approved) does a comprehensive alignment of this same file, adding agentruntimes (same fix) and removing 79 lines of over-provisioned rules (secrets, CRDs, webhooks, RBAC management, deprecated extensions API group). These two PRs will have merge conflicts on charts/kagenti-operator/templates/rbac/role.yaml. Recommend coordinating merge order:
Areas reviewed: Helm/K8s RBAC
Commits: 1 commit, signed-off: yes
CI status: All 14 checks passing (including E2E)
| verbs: | ||
| - create | ||
| - delete | ||
| - get |
There was a problem hiding this comment.
suggestion (coordination): This exact change is also included in PR #249, which does a broader RBAC cleanup aligning the entire Helm ClusterRole with config/rbac/role.yaml. PR #249 adds agentruntimes (same rules as here) plus removes ~79 lines of over-provisioned permissions the operator doesn't use (secrets, CRDs, webhooks, RBAC, deprecated extensions API group, etc.).
These two PRs will conflict on this file. If #249 merges first, this PR is fully superseded. Worth coordinating merge order with @ChristianZaccaria.
There was a problem hiding this comment.
Perhaps it makes sense to merge #249 as the changes are more extensive. We can close this after that one is merged.
Severity ClarificationThis bug completely breaks the AgentRuntime feature, which is documented as "the declarative way to enroll a workload into the Kagenti platform" (docs/architecture.md). Impact on UsersUsers creating AgentRuntime resources expecting agent enrollment will see:
Workaround: Users must manually add Root CauseAgentRuntime was added in commit RecommendationThis should be treated as a P0 bug fix for the AgentRuntime feature. All |
Problem
The kagenti-operator is deployed with insufficient RBAC permissions, causing continuous errors in production:
Logs showing the issue:
This error repeats continuously (exponential backoff), filling operator logs and preventing the AgentRuntimeReconciler from functioning.
Root Cause
Mismatch between code and Helm chart:
Code declares required RBAC in internal/controller/agentruntime_controller.go:66-69:
Controller is always registered in cmd/main.go:323-330
Helm chart ClusterRole is missing these permissions in charts/kagenti-operator/templates/rbac/role.yaml
Solution
Add the missing
agentruntimespermissions to the ClusterRole Helm template to match the controller's kubebuilder RBAC annotations.Changes:
agentruntimesresource permissions (get, list, watch, create, update, patch, delete)agentruntimes/statussubresource permissions (get, update, patch)agentruntimes/finalizerssubresource permissions (update)Impact
Before fix:
After fix:
Testing
Deployed operator with this fix in kind cluster:
Type of Change
Checklist
Related Issues
This is a pre-existing bug affecting all deployments of kagenti-operator. No specific issue filed, discovered during E2E testing of PR #247.