Skip to content

Conversation

@zreigz
Copy link
Member

@zreigz zreigz commented Jan 29, 2026

The CronJob status is failed if a past job has failed pod but ultimately succeeds. This PR fixes it.

Test Plan

Test environment: https://console.plrl-dev-aws.onplural.sh/

Checklist

  • I have added a meaningful title and summary to convey the impact of this PR to a user.
  • I have deployed the agent to a test environment and verified that it works as expected.
    • Agent starts successfully.
    • Service creation works without any issues when using raw manifests and Helm templates.
    • Service creation works when resources contain both CRD and CRD instances.
    • Service templating works correctly.
    • Service errors are reported properly and visible in the UI.
    • Service updates are reflected properly in the cluster.
    • Service resync triggers immediately and works as expected.
    • Sync waves annotations are respected.
    • Sync phases annotations are respected. Phases are executed in the correct order.
    • Sync hook delete policies are respected. Resources are not recreated once they reach the desired state.
    • Service deletion works and cleanups resources properly.
    • Services can be recreated after deletion.
    • Service detachment works and keeps resources unaffected.
    • Services can be recreated after detachment.
    • Service component trees are working as expected.
    • Cluster health statuses are being updated.
    • Agent logs do not contain any errors (after running for at least 30 minutes).
    • There are no visible anomalies in Datadog (after running for at least 30 minutes).
  • I have added tests to cover my changes.
  • If required, I have updated the Plural documentation accordingly.

@zreigz zreigz requested a review from a team as a code owner January 29, 2026 08:36
@zreigz zreigz added the hotfix label Jan 29, 2026
@linear
Copy link

linear bot commented Jan 29, 2026

if component.State != nil && *component.State == console.ComponentStateRunning {
// Skip checking child pods for the Job. The database cache contains only failed pods, and the Job may succeed after a retry.
if component.Kind == "Job" {
// Skip checking child pods for the Job and CronJob. The database cache contains only failed pods, and the Job/CronJob may succeed after a retry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this works, the cron job owns jobs which owns pods, which makes me suspect you need to probe another layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants