-
Notifications
You must be signed in to change notification settings - Fork 34
Fix deploy script rolling deployment timeout issue #1927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -87,6 +87,50 @@ wait_for_deployment() { | |||||||||||||
| return 0 | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| # Wait for the current deployment to fully complete (all instances replaced) | ||||||||||||||
| wait_for_deployment_complete() { | ||||||||||||||
| local app_name="$1" | ||||||||||||||
| local max_wait=900 # 15 minutes max for full deployment | ||||||||||||||
| local wait_interval=15 | ||||||||||||||
| local waited=0 | ||||||||||||||
|
|
||||||||||||||
| echo "Waiting for deployment of $app_name to complete..." | ||||||||||||||
|
|
||||||||||||||
| local app_guid=$(cf app "$app_name" --guid) | ||||||||||||||
|
|
||||||||||||||
| while [ $waited -lt $max_wait ]; do | ||||||||||||||
| # Get the most recent deployment status | ||||||||||||||
| local deployment_info=$(cf curl "/v3/deployments?app_guids=${app_guid}&order_by=-created_at&per_page=1" 2>/dev/null) | ||||||||||||||
| local status=$(echo "$deployment_info" | grep -o '"value":"[^"]*"' | head -1 | cut -d'"' -f4 || echo "") | ||||||||||||||
| local reason=$(echo "$deployment_info" | grep -o '"reason":"[^"]*"' | head -1 | cut -d'"' -f4 || echo "") | ||||||||||||||
|
Comment on lines
+104
to
+105
|
||||||||||||||
| local status=$(echo "$deployment_info" | grep -o '"value":"[^"]*"' | head -1 | cut -d'"' -f4 || echo "") | |
| local reason=$(echo "$deployment_info" | grep -o '"reason":"[^"]*"' | head -1 | cut -d'"' -f4 || echo "") | |
| local status=$(echo "$deployment_info" | jq -r '.resources[0].status.value // ""' 2>/dev/null || echo "") | |
| local reason=$(echo "$deployment_info" | jq -r '.resources[0].status.reason // ""' 2>/dev/null || echo "") |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the deployment completes with an unexpected reason (line 115), the error message shows the reason but doesn't provide guidance on what to do next. Consider logging more diagnostic information such as the full deployment status or suggesting manual investigation steps, to help operators understand what went wrong.
| echo "✗ Deployment finalized with reason: $reason" | |
| echo "✗ Deployment finalized with unexpected reason: $reason" | |
| echo "Full deployment response from Cloud Foundry:" | |
| echo "$deployment_info" | |
| echo "Please investigate this deployment manually, for example by running:" | |
| echo " cf curl \"/v3/deployments?app_guids=${app_guid}&order_by=-created_at&per_page=1\"" |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout message indicates a warning, but the function returns 1 (failure), which will cause the deployment to be retried. However, a timeout after 15 minutes might indicate a stuck deployment rather than a transient failure. Consider whether the deployment should be explicitly canceled before returning failure, to prevent accumulating stuck deployments. Alternatively, document that stuck deployments will remain active and may need manual cleanup.
| echo "Warning: Timed out waiting for deployment to complete after ${max_wait}s" | |
| echo "Warning: Timed out waiting for deployment to complete after ${max_wait}s; deployment may still be active or stuck and may require manual cleanup." |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The else block at line 245 handles the case where wait_for_deployment_complete returns failure, but there's no similar else block for when exit_code is non-zero at line 238. This means if cf push fails (exit_code != 0), the code immediately jumps to the retry check at line 251 without any specific error handling or logging for the push failure itself. While the error message at line 252 mentions "Push failed", it might be helpful to add an explicit else block to log the specific push failure and exit code before entering retry logic.
| fi | |
| fi | |
| else | |
| echo "cf push command failed for $app_name with exit code: $exit_code" | |
| # Continue to retry logic below |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using grep -c '"ACTIVE"' counts occurrences of the literal string "ACTIVE" anywhere in the JSON response, which could match fields other than the deployment status (e.g., in comments, descriptions, or other unrelated fields). This could lead to false positives where the script thinks there's an active deployment when there isn't one. Consider using a more precise check, such as checking for the status field specifically with a JSON parser, or using a more specific grep pattern that includes the field context.
| local active_deployment=$(cf curl "/v3/deployments?app_guids=${app_guid}&status_values=ACTIVE" 2>/dev/null | grep -c '"ACTIVE"' || echo "0") | |
| local active_deployment=$(cf curl "/v3/deployments?app_guids=${app_guid}&status_values=ACTIVE" 2>/dev/null | jq -r '.pagination.total_results // 0' 2>/dev/null || echo "0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
cf app "$app_name" --guidfails, the function will continue with an emptyapp_guid. While the empty string check on line 256 handles this case for the retry logic, in thewait_for_deployment_completefunction (line 99), failure to get the app GUID will result in an API call with an empty GUID parameter, which may return unexpected results or errors. Consider adding error handling to fail fast if the app GUID cannot be retrieved.