-
Notifications
You must be signed in to change notification settings - Fork 112
Add rolling_deploy_on_docker_failure option #180
Conversation
46cb294 to
dcbf8a0
Compare
lib/tasks/deploy.rake
Outdated
| container = start_new_container(server, service, defined_restart_policy) | ||
| rescue e | ||
| on_fail = fetch(:rolling_deploy_on_docker_failure, :exit) | ||
| raise e unless on_fail == :continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here you want just raise which will re-raise the last error. Otherwise you get the stacktrace from here rather than the original error.
| end | ||
| end | ||
|
|
||
| task :rolling_deploy do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to validate that the contents of rolling_deploy_on_failure are one of the expected options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you rather we have a validate_rolling_deploy_options dependent task? I don't believe we perform any validation on the values of other rolling deploy options, could take a swing at that. Or just put the one validation at the top of this task if you think it's better to keep this changeset smaller.
README.md
Outdated
| ports are not HTTP services, this allows you to only health check the ports | ||
| that are. The default is an empty array. If you have non-HTTP services that you | ||
| want to check, see Custom Health Checks in the previous section. | ||
| * `rolling_deploy_on_docker_failure` => What to do when Centurion encounters an error stopping or starting a container during a rolling deploy. By default, when an error is encountered the deploy will stop and immediately raise that error. If this option is set to `:continue` Centurion will continue deploying to the remaining hosts and raise at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this rolling_deploy_on_failure because it's more generally about failure and not necessarily about a Docker failure.
|
Seems like a good addition to me. Thoughts @intjonathan ? |
|
Does the existing |
|
@intjonathan I think I think the main benefit of the option in this PR is to reduce the blast radius in terms of undeployed-to hosts in the event that an error in communicating with a given host occurs in the middle of a deploy. That plus a more robust |
| container = start_new_container(server, service, defined_restart_policy) | ||
| rescue e | ||
| on_fail = fetch(:rolling_deploy_on_failure, :exit) | ||
| raise unless on_fail == :continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the validation discussion, I'd greenlight this if a log line here made it obvious what happened. Something like
if on_fail == :continue
info "Caught error #{e.message}, but continuing deploy because rolling_deploy_on_failure is #{on_fail}"
else
error "Raising exception, as rolling_deploy_on_failure was #{on_fail} and not :continue"
raise
end|
We're archiving the project, so I'm closing all open PRs: |
Currently, when a rolling deploy errors during the process of stopping an existing container and spinning up the new one, the deploy stops and exits by raising that error. The longer the list of servers you're deploying to, the more of a pain it is to determine which servers did and did not successfully deploy, and you're left in an inconsistent state.
This PR adds an option,
rolling_deploy_on_docker_failurethat defaults to:exit, which preserves the existing behavior. When set to:continue, Centurion will try to deploy to every host on its list and keep a running collection of the errors it encounters along the way. When all the servers are done, it will raise a single error with a concatenation of all the error messages it encountered. This should: