Add rolling_deploy_on_docker_failure option #180

imakewebthings · 2017-02-24T23:13:13Z

Currently, when a rolling deploy errors during the process of stopping an existing container and spinning up the new one, the deploy stops and exits by raising that error. The longer the list of servers you're deploying to, the more of a pain it is to determine which servers did and did not successfully deploy, and you're left in an inconsistent state.

This PR adds an option, rolling_deploy_on_docker_failure that defaults to :exit, which preserves the existing behavior. When set to :continue, Centurion will try to deploy to every host on its list and keep a running collection of the errors it encounters along the way. When all the servers are done, it will raise a single error with a concatenation of all the error messages it encountered. This should:

Ensure hosts that are healthy at the time of deploy get deployed to, regardless of the health of other hosts in the list.
Make it easier to see at the end of the deploy failure which hosts are unhealthy and still running the old container.

relistan · 2017-02-27T12:12:04Z

lib/tasks/deploy.rake

+        container = start_new_container(server, service, defined_restart_policy)
+      rescue e
+        on_fail = fetch(:rolling_deploy_on_docker_failure, :exit)
+        raise e unless on_fail == :continue


I think here you want just raise which will re-raise the last error. Otherwise you get the stacktrace from here rather than the original error.

relistan · 2017-02-27T12:13:28Z

lib/tasks/deploy.rake

    end
  end

  task :rolling_deploy do


Would be good to validate that the contents of rolling_deploy_on_failure are one of the expected options.

Would you rather we have a validate_rolling_deploy_options dependent task? I don't believe we perform any validation on the values of other rolling deploy options, could take a swing at that. Or just put the one validation at the top of this task if you think it's better to keep this changeset smaller.

relistan · 2017-02-27T12:14:03Z

README.md

   ports are not HTTP services, this allows you to only health check the ports
   that are. The default is an empty array. If you have non-HTTP services that you
   want to check, see Custom Health Checks in the previous section.
+ * `rolling_deploy_on_docker_failure` => What to do when Centurion encounters an error stopping or starting a container during a rolling deploy. By default, when an error is encountered the deploy will stop and immediately raise that error. If this option is set to `:continue` Centurion will continue deploying to the remaining hosts and raise at the end. 


Let's call this rolling_deploy_on_failure because it's more generally about failure and not necessarily about a Docker failure.

relistan · 2017-02-27T12:14:28Z

Seems like a good addition to me. Thoughts @intjonathan ?

intjonathan · 2017-03-06T18:52:33Z

Does the existing deploy:repair task just need documentation and fixing? Seems like that'd be a one-shot cleanup task you could run instead of using the output of this to build a host-specific deploy.

imakewebthings · 2017-03-09T00:03:27Z

@intjonathan I think repair would have to do a lot more work than check status endpoints to determine the hosts that need the deploy. It would have to check all running containers for version mismatches, and that's assuming the deployer is using distinct versioned image tags and not latest. This could be a neat feature of repair.

I think the main benefit of the option in this PR is to reduce the blast radius in terms of undeployed-to hosts in the event that an error in communicating with a given host occurs in the middle of a deploy. That plus a more robust repair would be a great combo.

intjonathan · 2017-06-30T18:07:27Z

lib/tasks/deploy.rake

+        container = start_new_container(server, service, defined_restart_policy)
+      rescue e
+        on_fail = fetch(:rolling_deploy_on_failure, :exit)
+        raise unless on_fail == :continue


Regarding the validation discussion, I'd greenlight this if a log line here made it obvious what happened. Something like

if on_fail == :continue info "Caught error #{e.message}, but continuing deploy because rolling_deploy_on_failure is #{on_fail}" else error "Raising exception, as rolling_deploy_on_failure was #{on_fail} and not :continue" raise end

CLAassistant · 2020-03-03T17:09:45Z

All committers have signed the CLA.

idleyoungman · 2025-09-11T18:26:04Z

We're archiving the project, so I'm closing all open PRs:
#208

Add rolling_deploy_on_docker_failure option

dcbf8a0

imakewebthings force-pushed the ctroughton/rolling_deploy_continue branch from 46cb294 to dcbf8a0 Compare February 24, 2017 23:40

relistan reviewed Feb 27, 2017

View reviewed changes

Rename to rolling_deploy_on_failure, reraise last error

77655b6

intjonathan reviewed Jun 30, 2017

View reviewed changes

idleyoungman closed this Sep 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add rolling_deploy_on_docker_failure option #180

Add rolling_deploy_on_docker_failure option #180

Uh oh!

imakewebthings commented Feb 24, 2017

Uh oh!

relistan Feb 27, 2017

Uh oh!

relistan Feb 27, 2017

Uh oh!

imakewebthings Feb 28, 2017

Uh oh!

relistan Feb 27, 2017

Uh oh!

relistan commented Feb 27, 2017

Uh oh!

intjonathan commented Mar 6, 2017

Uh oh!

imakewebthings commented Mar 9, 2017 •

edited

Loading

Uh oh!

intjonathan Jun 30, 2017

Uh oh!

CLAassistant commented Mar 3, 2020 •

edited

Loading

Uh oh!

idleyoungman commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Add rolling_deploy_on_docker_failure option #180

Add rolling_deploy_on_docker_failure option #180

Uh oh!

Conversation

imakewebthings commented Feb 24, 2017

Uh oh!

relistan Feb 27, 2017

Choose a reason for hiding this comment

Uh oh!

relistan Feb 27, 2017

Choose a reason for hiding this comment

Uh oh!

imakewebthings Feb 28, 2017

Choose a reason for hiding this comment

Uh oh!

relistan Feb 27, 2017

Choose a reason for hiding this comment

Uh oh!

relistan commented Feb 27, 2017

Uh oh!

intjonathan commented Mar 6, 2017

Uh oh!

imakewebthings commented Mar 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

intjonathan Jun 30, 2017

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Mar 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

idleyoungman commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

imakewebthings commented Mar 9, 2017 •

edited

Loading

CLAassistant commented Mar 3, 2020 •

edited

Loading