Skip to content

Conversation

@AnkanMisra
Copy link

Closes #1830

Summary

Adds a CollectorDaemonSetExists status condition to OperatorConfig to surface when the collector DaemonSet is missing.

Problem

When the collector DaemonSet (gmp-system/collector) is deleted, the operator logs a warning but returns success without surfacing any status condition. This leaves metrics collection silently broken with no way for users to detect the outage via the API.

Solution

  • Add CollectorDaemonSetExists condition to OperatorConfig.Status
  • Set True when the DaemonSet exists, False with reason DaemonSetMissing when not found
  • Initialize to Unknown to handle unexpected API errors correctly

This only checks DaemonSet existence, not pod readiness, and does not auto recreate deleted DaemonSets

@google-cla
Copy link

google-cla bot commented Jan 25, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Re-added GenerationChangedPredicate to DaemonSet watcher to avoid noise from status updates (while still catching deletions). Added GenerationChangedPredicate to OperatorConfig watcher to prevent feedback loops from status updates.
Address CodeRabbit feedback: initialize condition status to Unknown
instead of True, so that on unexpected errors (network issues, RBAC),
the condition reflects uncertainty rather than falsely indicating the
DaemonSet exists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Operator does not self-heal collector DaemonSet deletion, causing silent collection outage

1 participant