Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion pkg/operator/staticpod/controller/guard/guard_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,9 @@ func (c *GuardController) sync(ctx context.Context, syncCtx factory.SyncContext)
_, _, err = resourceapply.ApplyPod(ctx, c.podGetter, syncCtx.Recorder(), pod)
if err != nil {
klog.Errorf("Unable to apply pod %v changes: %v", pod.Name, err)
errs = append(errs, fmt.Errorf("Unable to apply pod %v changes: %v", pod.Name, err))
if !apierrors.IsConflict(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this PR handling conflicts gracefully? 🙂

In my opinion, we should not hide errors.

A conflict error is still an error.
For example, it might indicate a conflict with a different actor in the system.

What we could do instead (if we aren’t already) is consider retrying certain types of requests when a conflict error occurs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that ignoring is very graceful 🙂

Or we can simply return when we delete the pod and wait for another round. I think that that's the cause. Because the deployment is using Recreate strategy, there should not be multiple pods at once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will investigate further how to handle this in a better way and what the root cause actually is...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually racing with Kubelet that is updating the status after pod creation, from I can tell. This seems to be normal.

What do you mean retrying requests? This is actually retrying the request, but once an up-to-date object is fetched.

errs = append(errs, fmt.Errorf("Unable to apply pod %v changes: %v", pod.Name, err))
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure logging such an error is useful, but I left the log statement there.

}
}
}
Expand Down