diff --git a/README.md b/README.md index 7601474..ec9c3bd 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,13 @@ -

+
+

docker rollout
Zero Downtime Deployment for Docker Compose

+[Documentation](https://docker-rollout.wowu.dev) +
+ + Docker CLI plugin that updates Docker Compose services without downtime. Simply replace `docker compose up -d ` with `docker rollout ` in your deployment scripts. This command will scale the service to twice the current number of instances, wait for the new containers to be ready, and then remove the old containers. @@ -12,6 +17,7 @@ Simply replace `docker compose up -d ` with `docker rollout ` - [Usage](#usage) - [⚠️ Caveats](#️-caveats) - [Sample deployment script](#sample-deployment-script) + - [Draining old containers](#draining-old-containers) - [Why?](#why) - [License](#license) @@ -49,14 +55,16 @@ Options: - `-w | --wait SECONDS` - (not required) - Time to wait for new container to be ready if healthcheck is not defined. Default: 10 - `--wait-after-healthy SECONDS` - (not required) - Time to wait after new container is healthy before removing old container. Works when healthcheck is defined. Default: 0 - `--env-file FILE` - (not required) - Path to env file, can be specified multiple times, as in `docker compose`. +- `--pre-stop-hook` - (not required) - Command to run in the old container before stopping it. Can be used for marking the container as unhealthy to make proxy stop sending requests to it, see [container draining](#draining-old-containers) below. -See [examples](https://docker-rollout.wowu.dev/examples/) in docs for sample `docker-compose.yml` files. +See [detailed options description](https://docker-rollout.wowu.dev/cli-options) and [compose.yml file examples](https://docker-rollout.wowu.dev/examples/) in docs. ### ⚠️ Caveats - Your service cannot have `container_name` and `ports` defined in `docker-compose.yml`, as it's not possible to run multiple containers with the same name or port mapping. Use a proxy as described below. - Proxy like [Traefik](https://github.com/traefik/traefik) or [nginx-proxy](https://github.com/nginx-proxy/nginx-proxy) is required to route traffic. -- Each deployment will increment the index in container name (e.g. `project-web-1` -> `project-web-2`). +- Each deployment will increment the number in container name (e.g. `project-web-1` -> `project-web-2`). +- To avoid dropping currently processed requests when stopping the old container, you need to setup [container draining](#draining-old-containers), which requires a slightly more complex setup. ### Sample deployment script @@ -68,17 +76,70 @@ git pull # Build new app image docker compose build web # Run database migrations -docker compose run web rake db:migrate -# Deploy new version +docker compose run --rm web rake db:migrate +# Deploy new version without downtime docker rollout web ``` -## Why? +### Draining old containers + +If you want to make sure that no requests are lost during deployment, you can use the following setup to implement container draining. It requires adding a healthcheck to your container that will be failing on purpose when performing rollout to make the proxy (Traefik or nginx-proxy) stop sending requests to the old container before it's removed. + +1. Add additional healthcheck to your container. The check should fail when `/tmp/drain` file is present. + + If your service doesn't have a healthcheck yet: + + ```yml + services: + web: + image: myapp:latest + healthcheck: + test: test ! -f /tmp/drain + interval: 5s + retries: 1 + ``` + + If your service already has a healthcheck (e.g. `curl -f http://localhost:3000/healthcheck`): + + ```yml + services: + web: + image: myapp:latest + healthcheck: + test: test ! -f /tmp/drain && curl -f http://localhost:3000/healthcheck + interval: 5s + retries: 1 + ``` + + +2. Use the following command to perform a zero-downtime deployment: + + ```bash + docker rollout web --pre-stop-hook "touch /tmp/drain && sleep 10" + ``` + + or add the following label to your service in `docker-compose.yml`: + + ```yml + services: + web: + image: myapp:latest + labels: + docker-rollout.pre-stop-hook: "touch /tmp/drain && sleep 10" + ``` + + Remember that docker-rollout reads labels from the old container, so **this hook will work on the next deployment**. CLI options have higher priority than container labels, so you can use it to override the label value. + + **Important:** make sure the sleep time is longer than the healthcheck `interval` × `retries` + `time to finish processing open requests` (e.g. interval: 10s, retries: 3, additional time of 5s = sleep 35) so the healthcheck has enough time to mark the container as unhealthy. + +Read more about [container draining in the docs](https://docker-rollout.wowu.dev/container-draining). + +## Why use docker-rollout? -Using `docker compose up` to deploy a new version of a service causes downtime because the app container is stopped before the new container is created. -If your application takes a while to boot, this may be noticeable to users. +Using `docker compose up` to deploy a new version of your app causes downtime because the app container has to be stopped before the new container is created. +If your application takes a while to boot, this may be noticeable to your users. -Using container orchestration tools like [Kubernetes](https://kubernetes.io/) or [Nomad](https://www.nomadproject.io/) is usually an overkill for projects that will do fine with a single-server Docker Compose setup. [Dokku](https://github.com/dokku/dokku) comes with zero-downtime deployment and more useful features, but it's not as flexible as Docker Compose. +Using container orchestration tools like [Kubernetes](https://kubernetes.io/) or [Nomad](https://www.nomadproject.io/) can be an overkill for projects that will do fine with a single-server Docker Compose setup. [Dokku](https://github.com/dokku/dokku) comes with zero-downtime deployment and more useful features, but it's not as flexible as Docker Compose. If you have a proxy like [Traefik](https://github.com/traefik/traefik) or [nginx-proxy](https://github.com/nginx-proxy/nginx-proxy), a zero downtime deployment can be achieved by writing a script that scales the service to 2 instances, waits for the new container to be ready, and then removes the old container. `docker rollout` does exactly that, but with a single command that you can use in your deployment scripts. @@ -86,4 +147,4 @@ If you're using Docker healthchecks, Traefik will make sure that traffic is only ## License -[MIT License](LICENSE) © Karol Musur +[MIT License](LICENSE) © [Karol Musur](https://wowu.dev) diff --git a/docker-rollout b/docker-rollout index a697cf7..f6ad77f 100755 --- a/docker-rollout +++ b/docker-rollout @@ -59,6 +59,7 @@ Options: --wait-after-healthy N When healthcheck is defined and succeeds, wait for additional N seconds before stopping the old container (default: 0 seconds) --env-file FILE Specify an alternate environment file + --pre-stop-hook CMD Run a command in the old container before stopping it. -v, --version Print plugin version EOF @@ -152,6 +153,23 @@ main() { sleep "$NO_HEALTHCHECK_TIMEOUT" fi + # Check if pre-stop hook is defined in first old container label + FIRST_OLD_CONTAINER_ID=$(echo "$OLD_CONTAINER_IDS" | cut -d\ -f 1) + # shellcheck disable=SC2086 # DOCKER_ARGS must be unquoted to allow multiple arguments + PRE_STOP_HOOK=${PRE_STOP_HOOK:-$(docker $DOCKER_ARGS inspect --format='{{index .Config.Labels "docker-rollout.pre-stop-hook"}}' "$FIRST_OLD_CONTAINER_ID")} + + if [ -n "$PRE_STOP_HOOK" ]; then + echo "==> Running pre-stop hook: $PRE_STOP_HOOK" + + for OLD_CONTAINER_ID in $OLD_CONTAINER_IDS; do + # shellcheck disable=SC2086 # DOCKER_ARGS must be unquoted to allow multiple arguments + docker $DOCKER_ARGS exec "$OLD_CONTAINER_ID" sh -c "$PRE_STOP_HOOK" & + done + + # Wait for all pre-stop hooks to finish + wait + fi + echo "==> Stopping and removing old containers" # shellcheck disable=SC2086 # DOCKER_ARGS and OLD_CONTAINER_IDS must be unquoted to allow multiple arguments @@ -186,6 +204,10 @@ while [ $# -gt 0 ]; do WAIT_AFTER_HEALTHY_DELAY="$2" shift 2 ;; + --pre-stop-hook) + PRE_STOP_HOOK="$2" + shift 2 + ;; -v | --version) echo "docker-rollout version $VERSION" exit 0 diff --git a/docs/cli-options.md b/docs/cli-options.md index bcdb521..cdf5528 100644 --- a/docs/cli-options.md +++ b/docs/cli-options.md @@ -12,17 +12,17 @@ nav_order: 3 ## Docker flags -All docker flags can be used with `docker rollout` normally, like `--context`, `--env`, `--log-level`, etc. +All docker flags can be used with `docker rollout` as usual, like `--context`, `--env`, `--log-level`, etc. ```bash docker --context my-remote-context rollout ``` -The plugin flags are described below. +The plugin flags are described below. Some of the options can be defined as container labels. ## `-f | --file FILE` -Path to compose file, can be specified multiple times, as in `docker compose`. +Path to compose file, can be specified multiple times, like in `docker compose`. **Example** @@ -100,3 +100,20 @@ Multiple env files: docker rollout --env-file .env --env-file .env.prod ``` +## `--pre-stop-hook COMMAND` + +Label: `docker-rollout.pre-stop-hook` + +Command to run in the old container before stopping it. Can be used for marking the container as unhealthy to gracefully finish running requests before deleting the container, see [container draining](container-draining). + +**Example** + +Deploy a new version of the service and mark the old container as unhealthy before stopping it: + +```bash +docker rollout --pre-stop-hook "touch /tmp/drain && sleep 10" +``` + +{: .warning } +This requires the service to have a healthcheck defined in `docker-compose.yml` or `Dockerfile` that will fail if `/tmp/drain` file exists. + diff --git a/docs/container-draining.md b/docs/container-draining.md new file mode 100644 index 0000000..bfe7d50 --- /dev/null +++ b/docs/container-draining.md @@ -0,0 +1,67 @@ +--- +title: Container Draining +nav_order: 4 +--- + +# True zero-downtime deployment with container draining + +If you want to make sure that no requests are lost during deployment, you can use the following setup to implement container draining. It requires adding a healthcheck to your container that will be failing on purpose when performing rollout to make the proxy (Traefik or nginx-proxy) stop sending requests to the old container before it's removed. This allows the old container to finish processing any open requests before it is stopped. + +1. Add additional healthcheck to your container. The check should fail when `/tmp/drain` file is present. + + If your service doesn't have a healthcheck yet: + + ```yml + services: + web: + image: myapp:latest + healthcheck: + test: test ! -f /tmp/drain + interval: 5s + retries: 1 + ``` + + If your service already has a healthcheck (e.g. `curl -f http://localhost:3000/healthcheck`): + + ```yml + services: + web: + image: myapp:latest + healthcheck: + test: test ! -f /tmp/drain && curl -f http://localhost:3000/healthcheck + interval: 5s + retries: 1 + ``` + + +2. Use the following command to perform a zero-downtime deployment: + + ```bash + docker rollout web --pre-stop-hook "touch /tmp/drain && sleep 10" + ``` + + or add the following label to your service in `docker-compose.yml`: + + ```yml + services: + web: + image: myapp:latest + labels: + docker-rollout.pre-stop-hook: "touch /tmp/drain && sleep 10" + ``` + + Remember that docker-rollout reads labels from the old container, so **this hook will be executed during the next deployment**. CLI options have higher priority than container labels, so you can use it to override the label value. + + **Important:** make sure the sleep time is longer than the healthcheck `interval` × `retries` + `time to finish processing open requests` (e.g. interval: 10s, retries: 3, additional time of 5s = sleep 35) so the healthcheck has enough time to mark the container as unhealthy. + +With this configuration, a rollout process looks like this: + +1. New container is started. +2. Docker daemon marks the old container as healthy. +3. Proxy starts sending requests to the new container alongside the old container. +4. We create `/tmp/drain` file in the old container. +5. Docker daemon marks the old container as unhealthy. +6. Proxy stops sending requests to the old container. +7. Old container is removed. + +See sample configuration for [Traefik](examples/container-draining.md). diff --git a/docs/examples/container-draining.md b/docs/examples/container-draining.md new file mode 100644 index 0000000..383cdf1 --- /dev/null +++ b/docs/examples/container-draining.md @@ -0,0 +1,69 @@ +--- +title: Traefik w/ Container Draining +parent: Examples +--- + +# Container Draining with Traefik + +Works with Docker Compose v2. + +## Files + +`Dockerfile` + +```Dockerfile +FROM alpine +# Use alpine image with whoami binary to have shell commands available +COPY --from=traefik/whoami /whoami /whoami +ENTRYPOINT [ "/whoami" ] +EXPOSE 80 +``` + +`compose.yml` + +```yml +services: + whoami: + build: . + labels: + - "traefik.enable=true" + - "traefik.http.routers.whoami.entrypoints=web" + - "traefik.http.routers.whoami.rule=Host(`example.com`)" + healthcheck: + test: "test ! -f /tmp/drain" + interval: 5s + retries: 1 + + traefik: + image: traefik:v2.9 + container_name: traefik + command: + - "--api.insecure=true" + - "--providers.docker=true" + - "--providers.docker.exposedbydefault=false" + - "--entrypoints.web.address=:80" + ports: + - "80:80" + - "8080:8080" + volumes: + - "/var/run/docker.sock:/var/run/docker.sock:ro" + +``` + +## Steps + +1. Change domain in `compose.yml` to a domain pointing to your server. + +2. Start all services + + ```bash + docker compose up -d + ``` + +3. Deploy new version of `whoami` service without downtime + + ```bash + docker rollout whoami --pre-stop-hook "touch /tmp/drain && sleep 10" + ``` + + New container will be created, then the old container will be marked as unhealthy and removed after 10 seconds. Traefik will stop sending requests to the old container when it becomes unhealthy, allowing it to finish pending requests before being removed. diff --git a/docs/getting-started.md b/docs/getting-started.md index 60dd81e..0d281d7 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -110,7 +110,7 @@ git pull # Build new app image docker compose build web # Run database migrations -docker compose run web rake db:migrate +docker compose run --rm web rake db:migrate # Deploy new version docker rollout web ``` diff --git a/docs/index.md b/docs/index.md index 04c4b3d..612c56f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -29,12 +29,11 @@ Using `docker compose up` to deploy a new version of a service causes downtime b - Your service cannot have `container_name` and `ports` defined in `docker-compose.yml`, as it's not possible to run multiple containers with the same name or port mapping. Use a proxy as described below. - Proxy like [Traefik](https://github.com/traefik/traefik) or [nginx-proxy](https://github.com/nginx-proxy/nginx-proxy) is required to route traffic to the containers. Refer to the [Examples](examples) for sample compose files. -- Each deployment will increment the index in container name (e.g. `project-web-1` -> `project-web-2`). +- Each deployment will increment the number in container name (e.g. `project-web-1` -> `project-web-2`). +- To avoid dropping currently processed requests when stopping the old container, you need to setup [container draining](#draining-old-containers), which requires a slightly more complex setup. ## Installation -Quick install: - ```bash # Create directory for Docker cli plugins mkdir -p ~/.docker/cli-plugins @@ -69,6 +68,12 @@ docker compose run web rake db:migrate docker rollout web ``` +### Draining old containers + +If you want to make sure that no requests are lost during deployment, you can use the following setup to implement container draining. It requires adding a healthcheck to your container that will be failing on purpose when performing rollout to make the proxy (Traefik or nginx-proxy) stop sending requests to the old container before it's removed. + +See [container draining](container-draining). + ## Rationale and alternatives Using `docker compose up` to deploy a new version of a service causes downtime because the app container is stopped before the new container is created. @@ -82,5 +87,5 @@ If you're using Docker healthchecks, Traefik will make sure that traffic is only ## License -[MIT License](https://github.com/wowu/docker-rollout/blob/main/LICENSE) © Karol Musur +[MIT License](https://github.com/wowu/docker-rollout/blob/main/LICENSE) © [Karol Musur](https://wowu.dev) diff --git a/docs/uninstalling.md b/docs/uninstalling.md index 6b55865..9a740f3 100644 --- a/docs/uninstalling.md +++ b/docs/uninstalling.md @@ -1,6 +1,6 @@ --- title: Uninstalling -nav_order: 5 +nav_order: 6 --- # Uninstalling docker rollout diff --git a/docs/upgrading.md b/docs/upgrading.md index 2cf5351..4c2e379 100644 --- a/docs/upgrading.md +++ b/docs/upgrading.md @@ -1,6 +1,6 @@ --- title: Upgrading -nav_order: 4 +nav_order: 5 --- # Upgrading docker rollout