-
Notifications
You must be signed in to change notification settings - Fork 598
HDDS-14751. Add basic ZDU flow in acceptance tests #9877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: HDDS-14496-zdu
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| #!/usr/bin/env bash | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| source "$TEST_DIR"/testlib.sh | ||
|
|
||
| ### CALLBACKS ### | ||
|
|
||
| before_service_restart() { | ||
| generate "generate-${SERVICE}" | ||
| } | ||
|
|
||
| after_service_restart() { | ||
| validate "generate-${SERVICE}" | ||
| } |
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file needs execute permissions to run. These are tracked by git. After changing the permissions: Note that
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice, thanks, I changed it |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,132 @@ | ||
| #!/usr/bin/env bash | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # This script tests upgrade from a previous release to the current | ||
| # binaries. Docker image with Ozone binaries is required for the | ||
| # initial version, while the snapshot version uses Ozone runner image. | ||
|
|
||
| set -e -o pipefail | ||
|
|
||
| # Fail if required vars are not set. | ||
| set -u | ||
| : "${OZONE_UPGRADE_FROM}" | ||
| : "${OZONE_UPGRADE_TO}" | ||
| : "${TEST_DIR}" | ||
| : "${SCM}" | ||
| : "${OZONE_CURRENT_VERSION}" | ||
| set +u | ||
|
|
||
| echo "--- RUNNING ROLLING UPGRADE TEST FROM $OZONE_UPGRADE_FROM TO $OZONE_UPGRADE_TO ---" | ||
|
|
||
| source "$TEST_DIR"/testlib.sh | ||
|
|
||
| # Restart one service with the target image. | ||
| rolling_restart_service() { | ||
| SERVICE="$1" | ||
|
|
||
| echo "--- RESTARTING ${SERVICE} WITH IMAGE ${OZONE_UPGRADE_TO} ---" | ||
|
|
||
| # Stop service | ||
| stop_containers "${SERVICE}" | ||
|
|
||
| # Check if this SCM container is running, as during a rolling upgrade it does stop-start one-by-one and | ||
| # we want to run write/read tests while one service is unavailable. Choose SCM (the container where the generate and | ||
| # validate robot tests are running) considering availability. | ||
| if [[ "$(docker inspect -f '{{.State.Running}}' "ha-${SCM}-1" 2>/dev/null)" != "true" ]]; then | ||
| local fallback_scm | ||
| fallback_scm="$(docker-compose --project-directory="$TEST_DIR/compose/ha" config --services | grep scm | grep -v "^${SCM}$" | head -n1)" | ||
| if [[ -n "$fallback_scm" ]]; then | ||
| export SCM="$fallback_scm" | ||
| fi | ||
| fi | ||
errose28 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # The data generation/validation is doing S3 API tests, so skip it in case the S3 gateway is updated | ||
| # TODO find a better solution | ||
errose28 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| if [[ ${SERVICE} != "s3g" ]]; then | ||
| callback before_service_restart | ||
| fi | ||
|
|
||
| # Restart service with new image. | ||
| prepare_for_image "${OZONE_UPGRADE_TO}" | ||
| create_containers "${SERVICE}" | ||
|
|
||
| # The data generation/validation is doing S3 API tests, so skip it in case the S3 gateway is updated | ||
| if [[ ${SERVICE} != "s3g" ]]; then | ||
| callback after_service_restart | ||
| fi | ||
|
|
||
| # Service-specific readiness checks. | ||
| case "${SERVICE}" in | ||
| om*) | ||
| wait_for_port "${SERVICE}" 9862 120 | ||
| ;; | ||
| scm*) | ||
| # SCM hostnames in this compose are scmX.org | ||
| wait_for_port "${SERVICE}.org" 9876 120 | ||
| ;; | ||
| dn*) | ||
| wait_for_port "${SERVICE}" 9882 120 | ||
| ;; | ||
| esac | ||
| } | ||
|
|
||
| echo "--- SETTING UP OLD VERSION $OZONE_UPGRADE_FROM ---" | ||
| OUTPUT_NAME="${OZONE_UPGRADE_FROM}-${OZONE_UPGRADE_TO}-1-original" | ||
| export OM_HA_ARGS='--' | ||
| prepare_for_image "$OZONE_UPGRADE_FROM" | ||
|
|
||
| echo "--- RUNNING WITH OLD VERSION $OZONE_UPGRADE_FROM ---" | ||
| start_docker_env | ||
|
|
||
| # TODO Add old data generation | ||
|
|
||
| echo "--- ROLLING UPGRADE TO $OZONE_UPGRADE_TO PRE-FINALIZED ---" | ||
|
|
||
| # SCMs first | ||
| for s in scm2 scm1 scm3; do | ||
| OUTPUT_NAME="${OZONE_UPGRADE_FROM}-${OZONE_UPGRADE_TO}-2-${s}" | ||
| rolling_restart_service "$s" "$OZONE_UPGRADE_TO" | ||
| done | ||
|
|
||
| # Recon | ||
| OUTPUT_NAME="${OZONE_UPGRADE_FROM}-${OZONE_UPGRADE_TO}-2-recon" | ||
| rolling_restart_service "recon" "$OZONE_UPGRADE_TO" | ||
|
|
||
| # DNs | ||
| for s in dn1 dn2 dn3 dn4 dn5; do | ||
| OUTPUT_NAME="${OZONE_UPGRADE_FROM}-${OZONE_UPGRADE_TO}-2-${s}" | ||
| rolling_restart_service "$s" "$OZONE_UPGRADE_TO" | ||
| done | ||
|
|
||
| for s in om1 om2 om3; do | ||
| OUTPUT_NAME="${OZONE_UPGRADE_FROM}-${OZONE_UPGRADE_TO}-2-${s}" | ||
| rolling_restart_service "$s" "$OZONE_UPGRADE_TO" | ||
| done | ||
|
|
||
| # S3 Gateway | ||
| OUTPUT_NAME="${OZONE_UPGRADE_FROM}-${OZONE_UPGRADE_TO}-2-s3g" | ||
| rolling_restart_service "s3g" "$OZONE_UPGRADE_TO" | ||
|
|
||
| # TODO Add downgrade scenario | ||
|
|
||
| echo "--- RUNNING WITH NEW VERSION $OZONE_UPGRADE_TO FINALIZED ---" | ||
| OUTPUT_NAME="${OZONE_UPGRADE_FROM}-${OZONE_UPGRADE_TO}-3-finalized" | ||
|
|
||
| # TODO Add validation for pre-finalized state | ||
|
|
||
| # Sends commands to finalize OM and SCM. | ||
| execute_robot_test "$SCM" -N "${OUTPUT_NAME}-finalize" upgrade/finalize.robot | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required? We already have the
ipv4_addressfields set for each service.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this I saw test failures with key creation timeouts:
In this test, when
scm1is stopped, DNS resolution forscm1.orgfell back to public DNS, which caused key creation timeouts while retrying on the bad address:extra_hostsforces deterministic in-cluster resolution even when a node is down, so HA client retries stay on the intended private IPs. Thisextra_hostsis also used at other docker yaml files, where we have HA and stopping containers one-by-one (e.g. debug tools, decommissioning)Cursor response while debugging: "Most likely root cause in your run: scm1.org resolves to a public IP while scm1 is intentionally down, and OM/SCM clients get stuck retrying that bad address.
scm1is the hostname that collides with real DNS (scm1.org), so when container DNS entry disappears during stop, resolver falls back to public DNS. Then Java caches/keeps retrying that bad endpoint long enough to hit your 5-minute test timeout."