-
Notifications
You must be signed in to change notification settings - Fork 29
add claude skills to create the maestro cluster #458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new Bash deployment monitoring hook that polls long-running tasks, reports progress, handles timeouts, and sends notifications (Slack and native OS). Also adds skill documentation describing the hook and a Maestro cluster setup skill for Azure ARO‑HCP. Changes
Sequence Diagram(s)sequenceDiagram
actor User as User/CLI
participant Hook as deployment-monitor.sh
participant FS as File System
participant Slack as Slack API
participant OS as System Notifier
User->>Hook: monitor <task_id>
Hook->>FS: validate task_id, create sanitized paths
Hook->>Hook: start polling loop
loop Polling
Hook->>FS: check for exit code file
alt not found
FS-->>Hook: still running
Hook->>FS: read output file, compute progress
Hook->>User: emit progress update
end
end
FS-->>Hook: exit code file present
Hook->>Hook: determine status (COMPLETE/FAILED)
Hook->>Hook: notify_completion(status,message)
rect rgb(230,245,255)
opt SLACK_WEBHOOK_URL set
Hook->>Slack: send_slack_notification(status,message)
Slack-->>Hook: response
end
alt macOS
Hook->>OS: osascript display notification
else Linux
Hook->>OS: notify-send notification
end
end
Hook->>User: return summary & exit
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
Comment |
71b2ed6 to
daca179
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Fix all issues with AI Agents 🤖
In @.claude/hooks/deployment-monitor.sh:
- Around line 31-49: monitor_deployment() is a placeholder that only sleeps;
replace it with a real polling loop that invokes a concrete deployment-status
check (e.g., call to kubectl rollout status, systemctl status, an HTTP health
endpoint, or grep for success/failure markers in a deployment log) and returns
appropriate exit codes on success or failure. Implement parameters or
environment-driven variables for the chosen check command/name and timeout, run
the check at each interval inside the existing loop, log both progress and
detected terminal states, break and return 0 on success, return non-zero on
explicit failure (or timeout), and ensure errors from the status command are
captured and logged using the existing HOOK_NAME context.
- Around line 15-19: The script currently sources CONFIG_FILE (config.sh)
directly which can execute arbitrary code; update the block that sets
CONFIG_FILE and uses source to first validate the file (e.g., check ownership
and permissions, ensure it's not writable by others, and optionally verify
content only contains allowed KEY=VALUE lines or matches an expected pattern)
and only source it if those checks pass; also add a comment above the logic
documenting the security expectation that config.sh must be owned by a trusted
user and not world-writable.
- Around line 21-29: Remove the unused dead function check_process() from the
script: locate the function definition named check_process() (the block that
starts with "check_process() {" and ends with its closing brace) and delete it
entirely since it is never referenced; run a quick grep for "check_process" to
confirm there are no callers before committing.
- Around line 72-92: Build the JSON payload using a safe JSON encoder (e.g., jq)
instead of unescaped heredoc to avoid injection via $message or $status:
construct the object with jq or equivalent (referencing the payload variable
construction) so strings are properly quoted/escaped, and pass the resulting
JSON to curl; after the curl call that posts to "$webhook_url" capture its exit
status ($?), log or print an error on non-zero and return that status so
failures bubble up. Ensure you still set "color", "title", "text", "footer", and
"ts" fields when building the JSON.
In @.claude/skills/README.md:
- Line 51: The README contains a bare URL in the line "Go to
https://api.slack.com/messaging/webhooks"; update that line to wrap the URL in
angle brackets or convert it to a markdown link (e.g., "Go to
<https://api.slack.com/messaging/webhooks>" or "Go to [Slack Incoming
Webhooks](https://api.slack.com/messaging/webhooks)") to satisfy MD034 and avoid
a bare URL.
🧹 Nitpick comments (2)
.claude/skills/setup-maestro-cluster.md (2)
95-104: Consider adding timeout and error cleanup for git clone.The git clone operation could hang indefinitely without a timeout. Additionally, if cloning fails, the empty temporary directory will remain on the system.
🔎 Suggested improvements
TEMP_DIR=$(mktemp -d) echo "Cloning ARO-HCP repository to: $TEMP_DIR" -if ! git clone https://github.com/Azure/ARO-HCP "$TEMP_DIR/ARO-HCP"; then +if ! timeout 300 git clone https://github.com/Azure/ARO-HCP "$TEMP_DIR/ARO-HCP"; then echo "ERROR: Failed to clone ARO-HCP repository" + rm -rf "$TEMP_DIR" exit 1 fi
125-134: Consider adding a timeout for the make command.The
make personal-dev-envcommand could potentially run for an extended period without bounds. Consider adding a timeout to prevent indefinite hangs.🔎 Suggested improvement
-if make personal-dev-env; then +if timeout 3600 make personal-dev-env; then echo "" echo "✓ Deployment completed successfully!" echo "ARO-HCP repository location: $TEMP_DIR/ARO-HCP" else echo "" - echo "ERROR: Deployment failed!" + echo "ERROR: Deployment failed or timed out!" popd > /dev/null exit 1 fi
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (5)
.claude/hooks/.gitignore.claude/hooks/config.sh.example.claude/hooks/deployment-monitor.sh.claude/skills/README.md.claude/skills/setup-maestro-cluster.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
.claude/skills/README.md
51-51: Bare URL used
(MD034, no-bare-urls)
🪛 Shellcheck (0.11.0)
.claude/hooks/deployment-monitor.sh
[warning] 18-18: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
[warning] 33-33: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 73-73: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
- GitHub Check: Red Hat Konflux / maestro-on-pull-request
- GitHub Check: e2e-grpc-broker
- GitHub Check: upgrade
- GitHub Check: e2e-broadcast-subscription
- GitHub Check: e2e
- GitHub Check: e2e-with-istio
🔇 Additional comments (11)
.claude/hooks/.gitignore (1)
1-1: Correctly ignores the configuration file containing secrets.The pattern properly prevents config.sh (with sensitive webhook URLs) from being committed while allowing config.sh.example to serve as a safe, shareable template.
.claude/hooks/config.sh.example (1)
1-19: Well-structured configuration template with clear instructions.The template provides helpful documentation, a clear copy-to-config pattern, and well-commented optional settings. The reference to Slack's webhook documentation makes setup straightforward for users.
.claude/skills/README.md (4)
1-33: Clear, comprehensive skill documentation with practical prerequisites.The setup-maestro-cluster skill documentation is well-organized with explicit environment variables, clear prerequisites, and helpful Azure CLI guidance. The step-by-step explanation of what the skill does aids discoverability.
46-77: Comprehensive hook documentation with clear configuration and usage patterns.The deployment-monitor.sh documentation effectively covers both file-based and environment variable configuration, with practical usage examples that cover monitoring, completion, and failure scenarios.
83-91: Clear explanation of skill mechanics helps users understand skill invocation.The section effectively explains the skill execution flow and value proposition for automation.
93-105: Helpful guidance for extending the skills framework with new skills.The section provides clear steps for skill creation and appropriately references the setup-maestro-cluster.md example. The emphasis on error handling and comments supports maintainability.
.claude/skills/setup-maestro-cluster.md (3)
62-72: LGTM!The Azure CLI prerequisite check is well-implemented with clear error messaging and helpful installation instructions.
76-93: LGTM!The Azure account verification logic correctly validates both login status and account name, with appropriate error messages to guide users.
48-48: No action needed. Go 1.24.4 is a valid, released version (June 2025) and the documentation is correct..claude/hooks/deployment-monitor.sh (2)
95-126: LGTM with dependency on fixing send_slack_notification.The notification function properly handles multiple notification channels (console, Slack, system notifications) with appropriate fallbacks. However, ensure the
send_slack_notification()function properly returns error status as noted in the previous comment.
128-140: CLI structure looks good, but depends on fixing monitor_deployment.The CLI interface is well-designed with clear command separation. However, the
monitorcommand should not be used untilmonitor_deployment()is properly implemented (as flagged in an earlier comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Fix all issues with AI Agents 🤖
In @.claude/hooks/deployment-monitor.sh:
- Around line 128-132: The monitor_deployment function currently never returns
non-zero on timeout; update monitor_deployment to return a non-zero status
(e.g., return 2) when the maximum wait time is reached and deployment wasn’t
confirmed, and ensure the top-level case invocation propagates that exit code
(replace the plain call to monitor_deployment with a call that exits on failure,
e.g., call monitor_deployment and then exit with its status) so the script exits
non‑zero when monitoring times out.
- Around line 118-125: The system notification calls embed $message directly,
allowing quotes/backslashes to break the osascript command; escape message
content before passing it to AppleScript and use the safe argument form for
notify-send: create a sanitized variable (e.g. safe_message) by first escaping
backslashes then escaping double quotes (safe_message="${message//\\/\\\\}";
safe_message="${safe_message//\"/\\\"}"), then call osascript -e "display
notification \"$safe_message\" with title \"Maestro Deployment $status\"" to
avoid injection, and for Linux keep notify-send as notify-send -- "Maestro
Deployment $status" "$message" so the message is passed as a single safe
argument (reference symbols: $message, safe_message variable, osascript call,
notify-send call).
In @.claude/skills/SKILL.md:
- Around line 96-97: The script sets TEMP_DIR with TEMP_DIR=$(mktemp -d) and
immediately uses it for cloning; add error handling after the mktemp call to
verify it succeeded (check exit status and that TEMP_DIR is non-empty and a
directory) and if not, print a clear error message and exit non-zero; update the
cloning code that references TEMP_DIR to only run after this validation so you
don't attempt git clone into an invalid path.
🧹 Nitpick comments (2)
.claude/skills/SKILL.md (2)
125-134: Consider adding timeout protection for the make command.The
make personal-dev-envcommand could potentially hang indefinitely if it encounters issues. For a long-running deployment process, consider adding a timeout mechanism or documenting expected runtime.💡 Suggestion
You could wrap the make command with a timeout:
-if make personal-dev-env; then +if timeout 30m make personal-dev-env; then echo "" echo "✓ Deployment completed successfully!" echo "ARO-HCP repository location: $TEMP_DIR/ARO-HCP" else echo "" - echo "ERROR: Deployment failed!" + echo "ERROR: Deployment failed or timed out!" popd > /dev/null exit 1 fiNote: Adjust the timeout value (30m) based on expected deployment duration.
96-141: The temporary directory is never cleaned up.The documentation mentions "Optionally clean up temporary clone (or keep for debugging)" in line 57, but the script never performs cleanup. The temporary directory will accumulate over multiple runs, potentially filling up disk space.
Consider either:
- Adding explicit cleanup with a trap handler for production use
- Documenting that users should manually clean up the printed temp directory path
- Using a predictable location instead of mktemp for easier manual cleanup
💡 Option 1: Add trap-based cleanup
+# Cleanup trap (optional - uncomment to enable automatic cleanup) +# cleanup() { +# if [ -n "$TEMP_DIR" ] && [ -d "$TEMP_DIR" ]; then +# echo "Cleaning up temporary directory: $TEMP_DIR" +# rm -rf "$TEMP_DIR" +# fi +# } +# trap cleanup EXIT + # Step 3: Clone ARO-HCP repository TEMP_DIR=$(mktemp -d)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (5)
.claude/hooks/.gitignore.claude/hooks/config.sh.example.claude/hooks/deployment-monitor.sh.claude/skills/README.md.claude/skills/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (3)
- .claude/hooks/.gitignore
- .claude/hooks/config.sh.example
- .claude/skills/README.md
🧰 Additional context used
🪛 Shellcheck (0.11.0)
.claude/hooks/deployment-monitor.sh
[warning] 18-18: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
[warning] 33-33: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 73-73: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
- GitHub Check: Red Hat Konflux / maestro-on-pull-request
- GitHub Check: e2e
- GitHub Check: e2e-grpc-broker
- GitHub Check: e2e-broadcast-subscription
- GitHub Check: e2e-with-istio
- GitHub Check: upgrade
🔇 Additional comments (2)
.claude/skills/SKILL.md (2)
143-149: Documentation looks good.The notes section accurately summarizes the script behavior and provides helpful context for users.
113-113: No action needed — Go version is correct.The
GOTOOLCHAIN=go1.24.4matches the ARO-HCP repository's requirement. The repository'sgo.workfile explicitly specifiesgo 1.24.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Fix all issues with AI Agents 🤖
In @.claude/hooks/deployment-monitor.sh:
- Around line 163-180: The JSON payload construction escapes message but not
status; create an escaped_status using the same fallback escaping logic used for
escaped_message (replace backslashes then double-quotes) and use escaped_status
in the payload title (instead of raw $status) so the title string is safe when
interpolated into payload; update the block that defines escaped_message and
payload to also define escaped_status and reference it in the "title" field.
- Around line 65-66: The hardcoded developer-specific path in the declarations
of output_file and exit_code_file will break other users; change them to build
paths dynamically (e.g., use TMPDIR or /tmp, or create a temp dir via mktemp -d)
and reference task_id when composing the filenames. Ensure the script creates
the target directory if missing and update any usages of output_file and
exit_code_file to rely on the new dynamic base (keep the variables output_file,
exit_code_file and task_id names so callers remain unchanged).
In @.claude/skills/SKILL.md:
- Around line 118-121: The env block currently overwrites USER (export
USER=oasis) which is risky; change to a different variable name (e.g.,
OASIS_USER or SKILL_USER) and update any references that consume USER to use
that new name instead (check where PERSIST, GITHUB_ACTIONS, and GOTOOLCHAIN are
used); ensure builds or the ARO-HCP Makefile do not require the literal USER—if
they do, conditionally set USER only when unset or map the new variable into the
Makefile invocation instead.
🧹 Nitpick comments (1)
.claude/hooks/deployment-monitor.sh (1)
67-67: Address ShellCheck SC2155: Separate declaration and assignment.Multiple lines combine
localdeclaration with command substitution, which can mask return values. This is a minor robustness concern.🔎 Proposed fix for representative lines
- local start_time=$(date +%s) + local start_time + start_time=$(date +%s) - local exit_code=$(cat "$exit_code_file") + local exit_code + exit_code=$(cat "$exit_code_file") - local current_lines=$(wc -l < "$output_file" | tr -d ' ') + local current_lines + current_lines=$(wc -l < "$output_file" | tr -d ' ') - local end_time=$(date +%s) + local end_time + end_time=$(date +%s)Also applies to: 78-78, 85-85, 104-104
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (5)
.claude/hooks/.gitignore.claude/hooks/config.sh.example.claude/hooks/deployment-monitor.sh.claude/skills/README.md.claude/skills/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (3)
- .claude/skills/README.md
- .claude/hooks/.gitignore
- .claude/hooks/config.sh.example
🧰 Additional context used
🪛 Shellcheck (0.11.0)
.claude/hooks/deployment-monitor.sh
[warning] 67-67: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 78-78: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 85-85: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 104-104: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
- GitHub Check: Red Hat Konflux / maestro-on-pull-request
- GitHub Check: upgrade
- GitHub Check: e2e-grpc-broker
- GitHub Check: e2e-broadcast-subscription
- GitHub Check: e2e-with-istio
- GitHub Check: e2e
🔇 Additional comments (5)
.claude/skills/SKILL.md (2)
96-102: LGTM! Error handling for mktemp is properly implemented.The validation now correctly checks that
TEMP_DIRis non-empty and is a valid directory before proceeding with the clone operation. This addresses the previous review concern.
48-48: No action needed —go1.24.4is a valid, released Go version.
go1.24.4was released on June 5, 2025, and is a legitimate patch version of Go 1.24. There is no issue with this toolchain specification.Likely an incorrect or invalid review comment.
.claude/hooks/deployment-monitor.sh (3)
221-230: LGTM! System notification escaping is now properly implemented.The AppleScript message is now correctly escaped for quotes and backslashes, and
notify-senduses--to safely handle the message argument. This addresses the previous security concern.
152-162: LGTM! JSON construction withjqis properly implemented.Using
jq -nwith--argfor string parameters ensures proper JSON escaping, preventing injection vulnerabilities. The fallback for systems withoutjqis also a good approach.
233-246: LGTM! Exit code propagation is now correct.The
monitorcase properly propagates the exit code viaexit $?, ensuring the script reflects the actual deployment outcome. This addresses the previous concern about masking timeout failures.
.claude/hooks/deployment-monitor.sh
Outdated
| if [ -f "$CONFIG_FILE" ]; then | ||
| # Validate config file security before sourcing | ||
| if [ -O "$CONFIG_FILE" ] || [ "$(stat -f %Su "$CONFIG_FILE" 2>/dev/null || stat -c %U "$CONFIG_FILE" 2>/dev/null)" = "$USER" ]; then | ||
| # Check if file is not world-writable | ||
| if [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "2" ] && \ | ||
| [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "6" ]; then | ||
| # shellcheck disable=SC1090 | ||
| source "$CONFIG_FILE" | ||
| else | ||
| echo "[$HOOK_NAME] WARNING: Skipping config.sh - file is world-writable (security risk)" | ||
| fi | ||
| else | ||
| echo "[$HOOK_NAME] WARNING: Skipping config.sh - file not owned by current user (security risk)" | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security checks are good, but permission validation is incomplete.
The config file security checks address the earlier concerns. However, the world-writable check on lines 26-27 only tests for permission bits 2 and 6, missing 3 and 7 which also have write permission (e.g., rwx = 7, wx = 3).
🔎 Proposed fix
if [ -O "$CONFIG_FILE" ] || [ "$(stat -f %Su "$CONFIG_FILE" 2>/dev/null || stat -c %U "$CONFIG_FILE" 2>/dev/null)" = "$USER" ]; then
# Check if file is not world-writable
- if [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "2" ] && \
- [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "6" ]; then
+ local other_perm
+ other_perm=$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)
+ # Bits 2, 3, 6, 7 all include write permission
+ if [[ ! "$other_perm" =~ ^[2367]$ ]]; then
# shellcheck disable=SC1090
source "$CONFIG_FILE"
.claude/skills/SKILL.md
Outdated
| export USER=oasis | ||
| export PERSIST=true | ||
| export GITHUB_ACTIONS=true | ||
| export GOTOOLCHAIN=go1.24.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Overwriting USER environment variable is risky.
Setting export USER=oasis overwrites the system's USER variable, which many tools rely on for user identification, home directory resolution, and permission checks. This could cause unexpected behavior in subsequent commands or scripts.
Consider using a different variable name:
🔎 Proposed fix
echo "Setting environment variables..."
-export USER=oasis
+export DEPLOY_USER=oasis
export PERSIST=true
export GITHUB_ACTIONS=true
export GOTOOLCHAIN=go1.24.4
-echo "USER=$USER"
+echo "DEPLOY_USER=$DEPLOY_USER"
echo "PERSIST=$PERSIST"Note: Verify with the ARO-HCP Makefile whether it specifically expects USER or if it can accept an alternative variable name.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export USER=oasis | |
| export PERSIST=true | |
| export GITHUB_ACTIONS=true | |
| export GOTOOLCHAIN=go1.24.4 | |
| export DEPLOY_USER=oasis | |
| export PERSIST=true | |
| export GITHUB_ACTIONS=true | |
| export GOTOOLCHAIN=go1.24.4 |
🤖 Prompt for AI Agents
In @.claude/skills/SKILL.md around lines 118-121, The env block currently
overwrites USER (export USER=oasis) which is risky; change to a different
variable name (e.g., OASIS_USER or SKILL_USER) and update any references that
consume USER to use that new name instead (check where PERSIST, GITHUB_ACTIONS,
and GOTOOLCHAIN are used); ensure builds or the ARO-HCP Makefile do not require
the literal USER—if they do, conditionally set USER only when unset or map the
new variable into the Makefile invocation instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Fix all issues with AI Agents 🤖
In @.claude/hooks/deployment-monitor.sh:
- Around line 201-214: The curl call in the Slack send function can return exit
code 0 on HTTP 4xx/5xx, so update the command that posts "$payload" to
"$webhook_url" (the block that sets curl_exit_code) to use curl --fail (or
explicitly check the HTTP status via --write-out '%{http_code}') and treat
non-2xx responses as failures; ensure curl_exit_code is set non-zero when the
HTTP response is not 2xx and include HOOK_NAME and the HTTP status or error text
in the error echo so callers relying on curl_exit_code (and variables like
payload, webhook_url) know the notification was rejected.
- Around line 86-116: The loop watching for "$exit_code_file" can hang forever
because while true only breaks on that file; add a max wait/timeout using
start_time to compute elapsed and break with a non-zero exit (or log and exit)
when elapsed exceeds a configurable MAX_WAIT_SECONDS; update the loop logic in
deployment-monitor.sh around the while true block to (1) define and use
MAX_WAIT_SECONDS (or MAX_WAIT_MINUTES) near start_time, (2) check elapsed each
iteration and when exceeded echo a timed-out message including $HOOK_NAME and
exit with a distinct code, and (3) ensure any tail/grep progress still runs
until timeout so last_line_count, $output_file and progress prints remain
unchanged but the loop will not run indefinitely.
♻️ Duplicate comments (1)
.claude/hooks/deployment-monitor.sh (1)
22-36: Permission validation still incomplete for world-writable check.The check rejects permission bits
2and6but misses3(wx) and7(rwx), which also include write permission. A file with permissionsxx3orxx7will bypass the security check and be sourced.🔎 Proposed fix using regex match
if [ -O "$CONFIG_FILE" ] || [ "$(stat -f %Su "$CONFIG_FILE" 2>/dev/null || stat -c %U "$CONFIG_FILE" 2>/dev/null)" = "$USER" ]; then # Check if file is not world-writable - if [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "2" ] && \ - [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "6" ]; then + local other_perm + other_perm=$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | tail -c 2 | head -c 1) + # Bits 2, 3, 6, 7 all include write permission + if [[ ! "$other_perm" =~ ^[2367]$ ]]; then # shellcheck disable=SC1090 source "$CONFIG_FILE"
🧹 Nitpick comments (1)
.claude/hooks/deployment-monitor.sh (1)
240-248: Consider escaping$statusin osascript command for defense in depth.While
$statusis typically controlled ("COMPLETE"/"FAILED"), it could come from external input via the CLI ($0 notify <status> <message>). For consistency with the message escaping, consider escaping status as well.🔎 Proposed fix
# Also send system notification if available if command -v osascript &> /dev/null; then # macOS notification - escape message for AppleScript local safe_message="${message//\\/\\\\}" safe_message="${safe_message//\"/\\\"}" - osascript -e "display notification \"$safe_message\" with title \"Maestro Deployment $status\"" + local safe_status="${status//\\/\\\\}" + safe_status="${safe_status//\"/\\\"}" + osascript -e "display notification \"$safe_message\" with title \"Maestro Deployment $safe_status\"" elif command -v notify-send &> /dev/null; then
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (5)
.claude/hooks/.gitignore.claude/hooks/config.sh.example.claude/hooks/deployment-monitor.sh.claude/skills/README.md.claude/skills/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (4)
- .claude/hooks/.gitignore
- .claude/skills/SKILL.md
- .claude/skills/README.md
- .claude/hooks/config.sh.example
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
- GitHub Check: Red Hat Konflux / maestro-on-pull-request
- GitHub Check: e2e-broadcast-subscription
- GitHub Check: e2e-with-istio
- GitHub Check: upgrade
- GitHub Check: e2e
- GitHub Check: e2e-grpc-broker
🔇 Additional comments (3)
.claude/hooks/deployment-monitor.sh (3)
1-17: LGTM!Clear header documentation with appropriate dependency listing. Using
set -efor fail-fast behavior is appropriate for a monitoring script.
38-53: LGTM!The
check_commandfunction correctly differentiates between required and optional dependencies with appropriate return codes.
251-264: LGTM!The main execution block correctly handles both commands, propagates exit codes from
monitor_deployment, and provides sensible defaults for thenotifycommand.
| # Send to Slack and capture exit status | ||
| local curl_exit_code | ||
| if curl -X POST -H 'Content-type: application/json' \ | ||
| --data "$payload" \ | ||
| "$webhook_url" \ | ||
| --silent --show-error; then | ||
| curl_exit_code=0 | ||
| else | ||
| curl_exit_code=$? | ||
| echo "[$HOOK_NAME] ERROR: Failed to send Slack notification (curl exit code: $curl_exit_code)" | ||
| fi | ||
|
|
||
| return $curl_exit_code | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curl success doesn't guarantee notification was accepted.
Without --fail, curl returns exit code 0 even when the server responds with HTTP 4xx/5xx errors. A rejected webhook (bad URL, rate limited, etc.) would log "sent successfully" in the caller.
🔎 Proposed fix
# Send to Slack and capture exit status
local curl_exit_code
if curl -X POST -H 'Content-type: application/json' \
--data "$payload" \
"$webhook_url" \
- --silent --show-error; then
+ --silent --show-error --fail; then
curl_exit_code=0
else🤖 Prompt for AI Agents
In @.claude/hooks/deployment-monitor.sh around lines 201-214, The curl call in
the Slack send function can return exit code 0 on HTTP 4xx/5xx, so update the
command that posts "$payload" to "$webhook_url" (the block that sets
curl_exit_code) to use curl --fail (or explicitly check the HTTP status via
--write-out '%{http_code}') and treat non-2xx responses as failures; ensure
curl_exit_code is set non-zero when the HTTP response is not 2xx and include
HOOK_NAME and the HTTP status or error text in the error echo so callers relying
on curl_exit_code (and variables like payload, webhook_url) know the
notification was rejected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
.claude/hooks/deployment-monitor.sh (2)
70-100: Add timeout protection to prevent infinite hang.The
while trueloop only exits whenexit_code_fileappears. If the task crashes without creating this file, uses an incorrect task ID, or the process dies unexpectedly, the monitor will hang forever. This was flagged in previous reviews but remains unaddressed.🔎 Proposed fix with timeout
# Wait for the deployment to complete local last_line_count=0 + local max_wait_seconds=${MONITOR_TIMEOUT:-7200} # Default 2 hours while true; do + # Check for timeout + local elapsed=$(($(date +%s) - start_time)) + if [ "$elapsed" -ge "$max_wait_seconds" ]; then + echo "[$HOOK_NAME] ERROR: Maximum wait time (${max_wait_seconds}s) reached" + notify_completion "FAILED" "Deployment monitoring timed out after ${max_wait_seconds} seconds" + return 2 + fi + # Check if exit code file exists (task completed) if [ -f "$exit_code_file" ]; then
185-198: Add--failflag to curl for proper error detection.Without the
--failflag, curl returns exit code 0 even when the server responds with HTTP 4xx/5xx errors. A rejected webhook (bad URL, rate limited, etc.) would incorrectly log "sent successfully" in the caller. This was flagged in a previous review but remains unaddressed.🔎 Proposed fix
# Send to Slack and capture exit status local curl_exit_code if curl -X POST -H 'Content-type: application/json' \ --data "$payload" \ "$webhook_url" \ - --silent --show-error; then + --silent --show-error --fail; then curl_exit_code=0 else curl_exit_code=$?
🧹 Nitpick comments (2)
.claude/hooks/deployment-monitor.sh (2)
15-15: Consider removing or relaxingset -efor graceful error handling.The
set -eflag causes the script to exit immediately on any command failure. For a monitoring script that should handle errors gracefully (e.g., failed notifications shouldn't abort monitoring), this may be too aggressive. Consider either removing it or handling specific commands with explicit error checks instead.
236-243: Consider showing usage when no arguments provided.The script defaults to the
notifycommand when no arguments are provided. Users might expect to see usage information instead. Consider changing the default case to show help rather than executing a notification.🔎 Proposed change
# Main execution -case "${1:-notify}" in +if [ $# -eq 0 ]; then + echo "Usage: $0 {monitor <task_id>|notify <status> <message>}" + exit 1 +fi + +case "$1" in monitor) monitor_deployment "$2" exit $? ;; notify) - notify_completion "${2:-COMPLETE}" "${3:-Deployment finished}" + notify_completion "$2" "$3" ;;
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
.claude/hooks/deployment-monitor.sh.claude/skills/README.md.claude/skills/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (2)
- .claude/skills/README.md
- .claude/skills/SKILL.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Red Hat Konflux / maestro-on-pull-request
- GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
- GitHub Check: e2e-broadcast-subscription
- GitHub Check: e2e-grpc-broker
- GitHub Check: upgrade
- GitHub Check: e2e
- GitHub Check: e2e-with-istio
🔇 Additional comments (1)
.claude/hooks/deployment-monitor.sh (1)
224-232: Well done on the notification escaping!The system notification handling properly escapes special characters for
osascriptand uses safe argument passing (--) fornotify-send. This effectively prevents command injection vulnerabilities.
Signed-off-by: hchenxa <huichen@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.claude/hooks/deployment-monitor.sh (1)
49-59: Consider hash-based path sanitization to prevent collisions.The current path sanitization (
tr '/' '-') could theoretically cause collisions where different paths map to the same sanitized string (e.g.,/a/b/cand/a-b/cboth becomea-b-c). While unlikely in practice, using a hash would guarantee uniqueness.🔎 Optional improvement using hash
- # Build task output paths dynamically based on current working directory - local cwd_sanitized - cwd_sanitized=$(pwd | tr '/' '-' | sed 's/^-//') - local task_dir="/tmp/claude/-${cwd_sanitized}/tasks" + # Build task output paths dynamically based on current working directory + local cwd_hash + cwd_hash=$(pwd | shasum -a 256 | cut -d' ' -f1) + local task_dir="/tmp/claude/${cwd_hash}/tasks"
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
.claude/hooks/deployment-monitor.sh.claude/skills/README.md.claude/skills/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (1)
- .claude/skills/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
- GitHub Check: Red Hat Konflux / maestro-on-pull-request
- GitHub Check: e2e-with-istio
- GitHub Check: e2e-broadcast-subscription
- GitHub Check: e2e-grpc-broker
- GitHub Check: e2e
- GitHub Check: upgrade
🔇 Additional comments (2)
.claude/skills/SKILL.md (1)
62-151: LGTM! Previous concerns have been addressed.The implementation script shows good practices:
- Proper error handling for
mktempwith validation (lines 98-102)- Safe
USERvariable handling using${USER:-oasis}pattern (line 119), which only sets the value if not already defined- Timeout protection for
git clone(line 106)- Clear error messages and cleanup on failure
The bash script is well-structured and ready for use.
.claude/hooks/deployment-monitor.sh (1)
1-260: Excellent work addressing previous review feedback!The script demonstrates significant improvements:
- Dynamic path construction replaces hardcoded paths (lines 49-59)
- Timeout protection prevents infinite loops (lines 72-82)
- JSON injection mitigated with jq and proper fallback escaping (lines 163-194)
- Command injection in notifications prevented with proper escaping (lines 238-243)
- HTTP error handling with
--failflag (line 202)- Comprehensive error handling and progress reporting throughout
The monitoring hook is well-implemented and production-ready.
| @@ -0,0 +1,158 @@ | |||
| # Setup Maestro Long-Running Cluster | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current SKILL.md is missing the required YAML frontmatter. It needs to look like this:
---
name: setup-maestro-cluster
description: Sets up a long-running Maestro cluster using Azure ARO-HCP
---
# Setup Maestro Long-Running Cluster
[rest of your documentation]
No description provided.