-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Describe the bug
The Kagenti operator's AgentCard controller periodically updates the
agentcard.kagenti.dev/resign-trigger annotation on deployment pod
templates. Since this annotation is part of the pod template spec,
each update triggers a rolling restart of the deployment. This causes
all agent pods to restart approximately every 30 minutes.
Steps To Reproduce
- Start an agent via Kagenti
- Wait ~30 minutes or more
- Check the pod's age -- it will be smaller that 30 minutes, most likely just a handful of minutes.
Expected Behavior
Agent pods running without interruptions.
Additional Context
Root cause
The sign-agentcard init container signs the agent card at pod
startup. To trigger re-signing, the operator updates the
resign-trigger annotation on the pod template, which forces
Kubernetes to create a new pod (rolling restart). The operator does
this on a periodic schedule even when the certificate hasn't rotated
and the signature is still valid.
Impact
- Unnecessary pod restarts cause brief service interruptions
- SPIFFE credentials are re-fetched on every restart
- Keycloak client-registration sidecar re-registers on every restart
(AuthBridge deployments) - Token caches are lost on restart
- High revision count clutters rollout history
Suggested fixes
-
Only re-sign when needed: Check certificate expiry before
triggering a resign. Skip the annotation update if the current
signature is still valid. -
Use a sidecar instead of an init container: A long-running
sidecar could watch for certificate rotation events and re-sign
the card in place, without requiring a pod restart. -
Use a shared volume: The init container writes the signed card
to an emptyDir volume. A sidecar could update the file in the
same volume without restarting the pod.