-
Notifications
You must be signed in to change notification settings - Fork 10
Add rollback controller reconcile loop #438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: pre-release-5.1.0
Are you sure you want to change the base?
Conversation
goelozev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
| return false | ||
| } | ||
|
|
||
| // TODO: move this into api/deploy/v1alpha1/release_helpers.go once release reconciler PR is merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
e41713a to
2adf735
Compare
1eb2a73 to
05eccce
Compare
05eccce to
c203ff0
Compare
goelozev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fabulous!
0x0013
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done!
Added a couple comments.
| meta.SetStatusCondition(&rollback.Status.Conditions, metav1.Condition{ | ||
| Type: deployv1alpha1.RollbackConditionInProgress, | ||
| Status: metav1.ConditionTrue, | ||
| Reason: "Retrying", | ||
| Message: fmt.Sprintf("Deployment attempt %d failed: %s. Retrying...", rollback.Status.AttemptCount, statusResp.Message), | ||
| LastTransitionTime: now, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're setting the condition here, but it seems it will get overwritten almost immediately in triggerDeployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily - if triggerDeployment fails to trigger the deployment with a retryable error, the condition won't be updated and the Rollback will get requeued for reconciliation. That said, I'm now wondering if the rollback.Status.Message message set on line 144 would be better set on the InProgress condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@0x0013 I've decided to simplify this a bit and agree it doesn't make sense to set the InProgress=True condition here in pollDeploymentStatus. Instead, we now set it once at the start of triggerDeployment with Reason=DeploymentTriggering and once at the end after a successful trigger with Reason=DeploymentTriggered.
This should make the behaviour clearer when a Rollback gets stuck trying and failing to trigger a deployment via the cicd backend, as the reason DeploymentTriggering will remain, whilst the failures themselves will be recorded in the rollback.Status.Message.
| Status: metav1.ConditionFalse, | ||
| Reason: "Completed", | ||
| Message: "Rollback deployment completed", | ||
| LastTransitionTime: now, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as noted previously, the transition time should get set by setStatusCondition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards leaving this (and the LastTransitionTime below) as set explicitly since this is the only transition the InProgress condition will make to False (and likewise below with the Succeeded condition to True).
The value I see is that by setting rollback.Status.CompletionTime and the LastTransitionTime of both conditions in this function to the same now value, it'll be clearer that the conditions transitioned to the terminal states as part of the completion of the rollback (particularly programmatically, if we ever need to compare the timestamps).
aa85c7c to
a7ac338
Compare
0x0013
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few more comments.
One thing I'm not fully convinced about, is setting the RollbackConditionInProgress with the "DeploymentTriggering" reason. While I understand the idea behind the behavior, I think I would prefer for RollbackConditionInProgress==True to mean that Deployment has already been triggered. It would make it clearer what the condition state implies.
However, I'm not yet fully convinced myself, and I will think a little more whether the current approach could be problematic down the line.
| case cicd.DeploymentStatusPending, cicd.DeploymentStatusInProgress: | ||
| // Update message and continue polling | ||
| rollback.Status.Message = statusResp.Message | ||
| rollback.Status.LastHTTPCallTime = &now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the intention of the LastHTTPCallTime field, but we only update it here on pending or in progress status. Should it not be updated if status is failed or successful?
I'm aware that a new triggered deployment would update the field as well, but I think other cases wouldn't.
| logger.Error(err, "failed to trigger deployment") | ||
|
|
||
| // Check if error is retryable | ||
| if deployerErr, ok := err.(*cicd.DeployerError); ok && deployerErr.Retryable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if deployerErr, ok := err.(*cicd.DeployerError); ok && deployerErr.Retryable { | |
| var deployerErr *cicd.DeployerError | |
| if errors.As(err, &deployerErr) && deployerErr.Retryable { |
This should have the same outcome, but is better practice, because errors.As() will evaluate any error in the chain, in case cicd.DeployerError had been wrapped before it's returned.
Rollback Controller Implementation
This PR implements the RollbackReconciler which manages
Rollbackresources by triggering deployments via a configurable CICD backend and polling for completion.internal/controller/deploy/rollback_controller.goInProgress,Succeeded)FromReleaseNamecmd/rollback-manager/main.go--cicd-backendflag (noop, github)--github-tokenflag /ROLLBACK_MANAGER_GITHUB_TOKENenv var--rollback-history-limitflagReconcile Loop
Error Handling
RollbackStatus ConditionsInProgressTrueInProgressFalse+SucceededSucceededTrueSucceededFalseManager Configuration