add claude skills to create the maestro cluster #458

hchenxa · 2026-01-05T07:02:43Z

No description provided.

coderabbitai · 2026-01-05T07:03:02Z

Walkthrough

Adds a new Bash deployment monitoring hook that polls long-running tasks, reports progress, handles timeouts, and sends notifications (Slack and native OS). Also adds skill documentation describing the hook and a Maestro cluster setup skill for Azure ARO‑HCP.

Changes

Cohort / File(s)	Summary
Deployment Monitoring Hook `.claude/hooks/deployment-monitor.sh`	New executable Bash hook implementing: command checks, `monitor_deployment(task_id)` polling loop with timeout, progress/emitted activity, `send_slack_notification(status,message,webhook_url)` (curl + optional jq), `notify_completion(status,message)` (Slack + macOS/Linux notifications), and CLI entry points (`monitor`, `notify`).
Skill Documentation `.claude/skills/README.md`, `.claude/skills/SKILL.md`	New docs describing Maestro Claude Skills, usage and configuration for the deployment-monitor hook, Slack webhook setup, prerequisites, and a step‑by‑step SKILL.md for deploying a Maestro cluster on Azure ARO‑HCP (prereqs, cloning, env vars, make personal-dev-env, cleanup notes).

Sequence Diagram(s)

sequenceDiagram
    actor User as User/CLI
    participant Hook as deployment-monitor.sh
    participant FS as File System
    participant Slack as Slack API
    participant OS as System Notifier

    User->>Hook: monitor <task_id>
    Hook->>FS: validate task_id, create sanitized paths
    Hook->>Hook: start polling loop

    loop Polling
        Hook->>FS: check for exit code file
        alt not found
            FS-->>Hook: still running
            Hook->>FS: read output file, compute progress
            Hook->>User: emit progress update
        end
    end

    FS-->>Hook: exit code file present
    Hook->>Hook: determine status (COMPLETE/FAILED)
    Hook->>Hook: notify_completion(status,message)

    rect rgb(230,245,255)
        opt SLACK_WEBHOOK_URL set
            Hook->>Slack: send_slack_notification(status,message)
            Slack-->>Hook: response
        end
        alt macOS
            Hook->>OS: osascript display notification
        else Linux
            Hook->>OS: notify-send notification
        end
    end

    Hook->>User: return summary & exit

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	No description was provided by the author, so this is a generic case with no information to evaluate against the changeset.	Add a brief description explaining the purpose of these new Claude skills and how they enable Maestro cluster creation.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main change: adding Claude skills (including setup-maestro-cluster and deployment-monitor) to create the Maestro cluster.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Fix all issues with AI Agents 🤖

In @.claude/hooks/deployment-monitor.sh:
- Around line 31-49: monitor_deployment() is a placeholder that only sleeps;
replace it with a real polling loop that invokes a concrete deployment-status
check (e.g., call to kubectl rollout status, systemctl status, an HTTP health
endpoint, or grep for success/failure markers in a deployment log) and returns
appropriate exit codes on success or failure. Implement parameters or
environment-driven variables for the chosen check command/name and timeout, run
the check at each interval inside the existing loop, log both progress and
detected terminal states, break and return 0 on success, return non-zero on
explicit failure (or timeout), and ensure errors from the status command are
captured and logged using the existing HOOK_NAME context.
- Around line 15-19: The script currently sources CONFIG_FILE (config.sh)
directly which can execute arbitrary code; update the block that sets
CONFIG_FILE and uses source to first validate the file (e.g., check ownership
and permissions, ensure it's not writable by others, and optionally verify
content only contains allowed KEY=VALUE lines or matches an expected pattern)
and only source it if those checks pass; also add a comment above the logic
documenting the security expectation that config.sh must be owned by a trusted
user and not world-writable.
- Around line 21-29: Remove the unused dead function check_process() from the
script: locate the function definition named check_process() (the block that
starts with "check_process() {" and ends with its closing brace) and delete it
entirely since it is never referenced; run a quick grep for "check_process" to
confirm there are no callers before committing.
- Around line 72-92: Build the JSON payload using a safe JSON encoder (e.g., jq)
instead of unescaped heredoc to avoid injection via $message or $status:
construct the object with jq or equivalent (referencing the payload variable
construction) so strings are properly quoted/escaped, and pass the resulting
JSON to curl; after the curl call that posts to "$webhook_url" capture its exit
status ($?), log or print an error on non-zero and return that status so
failures bubble up. Ensure you still set "color", "title", "text", "footer", and
"ts" fields when building the JSON.

In @.claude/skills/README.md:
- Line 51: The README contains a bare URL in the line "Go to
https://api.slack.com/messaging/webhooks"; update that line to wrap the URL in
angle brackets or convert it to a markdown link (e.g., "Go to
<https://api.slack.com/messaging/webhooks>" or "Go to [Slack Incoming
Webhooks](https://api.slack.com/messaging/webhooks)") to satisfy MD034 and avoid
a bare URL.

🧹 Nitpick comments (2)

.claude/skills/setup-maestro-cluster.md (2)
95-104: Consider adding timeout and error cleanup for git clone.

The git clone operation could hang indefinitely without a timeout. Additionally, if cloning fails, the empty temporary directory will remain on the system.
🔎 Suggested improvements
 TEMP_DIR=$(mktemp -d)
 echo "Cloning ARO-HCP repository to: $TEMP_DIR"
 
-if ! git clone https://github.com/Azure/ARO-HCP "$TEMP_DIR/ARO-HCP"; then
+if ! timeout 300 git clone https://github.com/Azure/ARO-HCP "$TEMP_DIR/ARO-HCP"; then
     echo "ERROR: Failed to clone ARO-HCP repository"
+    rm -rf "$TEMP_DIR"
     exit 1
 fi
125-134: Consider adding a timeout for the make command.

The make personal-dev-env command could potentially run for an extended period without bounds. Consider adding a timeout to prevent indefinite hangs.
🔎 Suggested improvement
-if make personal-dev-env; then
+if timeout 3600 make personal-dev-env; then
     echo ""
     echo "✓ Deployment completed successfully!"
     echo "ARO-HCP repository location: $TEMP_DIR/ARO-HCP"
 else
     echo ""
-    echo "ERROR: Deployment failed!"
+    echo "ERROR: Deployment failed or timed out!"
     popd > /dev/null
     exit 1
 fi

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 9f44c62 and b893f9b.

📒 Files selected for processing (5)

.claude/hooks/.gitignore
.claude/hooks/config.sh.example
.claude/hooks/deployment-monitor.sh
.claude/skills/README.md
.claude/skills/setup-maestro-cluster.md

🧰 Additional context used

🪛 markdownlint-cli2 (0.18.1)

.claude/skills/README.md

51-51: Bare URL used

(MD034, no-bare-urls)

🪛 Shellcheck (0.11.0)

.claude/hooks/deployment-monitor.sh

[warning] 18-18: ShellCheck can't follow non-constant source. Use a directive to specify location.

(SC1090)

[warning] 33-33: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 73-73: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
GitHub Check: Red Hat Konflux / maestro-on-pull-request
GitHub Check: e2e-grpc-broker
GitHub Check: upgrade
GitHub Check: e2e-broadcast-subscription
GitHub Check: e2e
GitHub Check: e2e-with-istio

🔇 Additional comments (11)

.claude/hooks/.gitignore (1)

1-1: Correctly ignores the configuration file containing secrets.

The pattern properly prevents config.sh (with sensitive webhook URLs) from being committed while allowing config.sh.example to serve as a safe, shareable template.

.claude/hooks/config.sh.example (1)

1-19: Well-structured configuration template with clear instructions.

The template provides helpful documentation, a clear copy-to-config pattern, and well-commented optional settings. The reference to Slack's webhook documentation makes setup straightforward for users.

.claude/skills/README.md (4)

1-33: Clear, comprehensive skill documentation with practical prerequisites.

The setup-maestro-cluster skill documentation is well-organized with explicit environment variables, clear prerequisites, and helpful Azure CLI guidance. The step-by-step explanation of what the skill does aids discoverability.

46-77: Comprehensive hook documentation with clear configuration and usage patterns.

The deployment-monitor.sh documentation effectively covers both file-based and environment variable configuration, with practical usage examples that cover monitoring, completion, and failure scenarios.

83-91: Clear explanation of skill mechanics helps users understand skill invocation.

The section effectively explains the skill execution flow and value proposition for automation.

93-105: Helpful guidance for extending the skills framework with new skills.

The section provides clear steps for skill creation and appropriately references the setup-maestro-cluster.md example. The emphasis on error handling and comments supports maintainability.

.claude/skills/setup-maestro-cluster.md (3)

62-72: LGTM!

The Azure CLI prerequisite check is well-implemented with clear error messaging and helpful installation instructions.

76-93: LGTM!

The Azure account verification logic correctly validates both login status and account name, with appropriate error messages to guide users.

48-48: No action needed. Go 1.24.4 is a valid, released version (June 2025) and the documentation is correct.

.claude/hooks/deployment-monitor.sh (2)

95-126: LGTM with dependency on fixing send_slack_notification.

The notification function properly handles multiple notification channels (console, Slack, system notifications) with appropriate fallbacks. However, ensure the send_slack_notification() function properly returns error status as noted in the previous comment.

128-140: CLI structure looks good, but depends on fixing monitor_deployment.

The CLI interface is well-designed with clear command separation. However, the monitor command should not be used until monitor_deployment() is properly implemented (as flagged in an earlier comment).

.claude/hooks/deployment-monitor.sh

.claude/skills/README.md

coderabbitai

Actionable comments posted: 3

Fix all issues with AI Agents 🤖

In @.claude/hooks/deployment-monitor.sh:
- Around line 128-132: The monitor_deployment function currently never returns
non-zero on timeout; update monitor_deployment to return a non-zero status
(e.g., return 2) when the maximum wait time is reached and deployment wasn’t
confirmed, and ensure the top-level case invocation propagates that exit code
(replace the plain call to monitor_deployment with a call that exits on failure,
e.g., call monitor_deployment and then exit with its status) so the script exits
non‑zero when monitoring times out.
- Around line 118-125: The system notification calls embed $message directly,
allowing quotes/backslashes to break the osascript command; escape message
content before passing it to AppleScript and use the safe argument form for
notify-send: create a sanitized variable (e.g. safe_message) by first escaping
backslashes then escaping double quotes (safe_message="${message//\\/\\\\}";
safe_message="${safe_message//\"/\\\"}"), then call osascript -e "display
notification \"$safe_message\" with title \"Maestro Deployment $status\"" to
avoid injection, and for Linux keep notify-send as notify-send -- "Maestro
Deployment $status" "$message" so the message is passed as a single safe
argument (reference symbols: $message, safe_message variable, osascript call,
notify-send call).

In @.claude/skills/SKILL.md:
- Around line 96-97: The script sets TEMP_DIR with TEMP_DIR=$(mktemp -d) and
immediately uses it for cloning; add error handling after the mktemp call to
verify it succeeded (check exit status and that TEMP_DIR is non-empty and a
directory) and if not, print a clear error message and exit non-zero; update the
cloning code that references TEMP_DIR to only run after this validation so you
don't attempt git clone into an invalid path.

🧹 Nitpick comments (2)

.claude/skills/SKILL.md (2)
125-134: Consider adding timeout protection for the make command.

The make personal-dev-env command could potentially hang indefinitely if it encounters issues. For a long-running deployment process, consider adding a timeout mechanism or documenting expected runtime.
💡 Suggestion

You could wrap the make command with a timeout:
-if make personal-dev-env; then
+if timeout 30m make personal-dev-env; then
     echo ""
     echo "✓ Deployment completed successfully!"
     echo "ARO-HCP repository location: $TEMP_DIR/ARO-HCP"
 else
     echo ""
-    echo "ERROR: Deployment failed!"
+    echo "ERROR: Deployment failed or timed out!"
     popd > /dev/null
     exit 1
 fi
Note: Adjust the timeout value (30m) based on expected deployment duration.
96-141: The temporary directory is never cleaned up.

The documentation mentions "Optionally clean up temporary clone (or keep for debugging)" in line 57, but the script never performs cleanup. The temporary directory will accumulate over multiple runs, potentially filling up disk space.

Consider either:

Adding explicit cleanup with a trap handler for production use

Documenting that users should manually clean up the printed temp directory path

Using a predictable location instead of mktemp for easier manual cleanup
💡 Option 1: Add trap-based cleanup
+# Cleanup trap (optional - uncomment to enable automatic cleanup)
+# cleanup() {
+#     if [ -n "$TEMP_DIR" ] && [ -d "$TEMP_DIR" ]; then
+#         echo "Cleaning up temporary directory: $TEMP_DIR"
+#         rm -rf "$TEMP_DIR"
+#     fi
+# }
+# trap cleanup EXIT
+
 # Step 3: Clone ARO-HCP repository
 TEMP_DIR=$(mktemp -d)

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between b893f9b and daca179.

📒 Files selected for processing (5)

.claude/hooks/.gitignore
.claude/hooks/config.sh.example
.claude/hooks/deployment-monitor.sh
.claude/skills/README.md
.claude/skills/SKILL.md

🚧 Files skipped from review as they are similar to previous changes (3)

.claude/hooks/.gitignore
.claude/hooks/config.sh.example
.claude/skills/README.md

🧰 Additional context used

🪛 Shellcheck (0.11.0)

.claude/hooks/deployment-monitor.sh

[warning] 18-18: ShellCheck can't follow non-constant source. Use a directive to specify location.

(SC1090)

[warning] 33-33: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 73-73: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
GitHub Check: Red Hat Konflux / maestro-on-pull-request
GitHub Check: e2e
GitHub Check: e2e-grpc-broker
GitHub Check: e2e-broadcast-subscription
GitHub Check: e2e-with-istio
GitHub Check: upgrade

🔇 Additional comments (2)

.claude/skills/SKILL.md (2)

143-149: Documentation looks good.

The notes section accurately summarizes the script behavior and provides helpful context for users.

113-113: No action needed — Go version is correct.

The GOTOOLCHAIN=go1.24.4 matches the ARO-HCP repository's requirement. The repository's go.work file explicitly specifies go 1.24.4.

.claude/hooks/deployment-monitor.sh

.claude/skills/SKILL.md

coderabbitai

Actionable comments posted: 4

Fix all issues with AI Agents 🤖

In @.claude/hooks/deployment-monitor.sh:
- Around line 163-180: The JSON payload construction escapes message but not
status; create an escaped_status using the same fallback escaping logic used for
escaped_message (replace backslashes then double-quotes) and use escaped_status
in the payload title (instead of raw $status) so the title string is safe when
interpolated into payload; update the block that defines escaped_message and
payload to also define escaped_status and reference it in the "title" field.
- Around line 65-66: The hardcoded developer-specific path in the declarations
of output_file and exit_code_file will break other users; change them to build
paths dynamically (e.g., use TMPDIR or /tmp, or create a temp dir via mktemp -d)
and reference task_id when composing the filenames. Ensure the script creates
the target directory if missing and update any usages of output_file and
exit_code_file to rely on the new dynamic base (keep the variables output_file,
exit_code_file and task_id names so callers remain unchanged).

In @.claude/skills/SKILL.md:
- Around line 118-121: The env block currently overwrites USER (export
USER=oasis) which is risky; change to a different variable name (e.g.,
OASIS_USER or SKILL_USER) and update any references that consume USER to use
that new name instead (check where PERSIST, GITHUB_ACTIONS, and GOTOOLCHAIN are
used); ensure builds or the ARO-HCP Makefile do not require the literal USER—if
they do, conditionally set USER only when unset or map the new variable into the
Makefile invocation instead.

🧹 Nitpick comments (1)

.claude/hooks/deployment-monitor.sh (1)
67-67: Address ShellCheck SC2155: Separate declaration and assignment.

Multiple lines combine local declaration with command substitution, which can mask return values. This is a minor robustness concern.
🔎 Proposed fix for representative lines
-    local start_time=$(date +%s)
+    local start_time
+    start_time=$(date +%s)

-            local exit_code=$(cat "$exit_code_file")
+            local exit_code
+            exit_code=$(cat "$exit_code_file")

-            local current_lines=$(wc -l < "$output_file" | tr -d ' ')
+            local current_lines
+            current_lines=$(wc -l < "$output_file" | tr -d ' ')

-    local end_time=$(date +%s)
+    local end_time
+    end_time=$(date +%s)
Also applies to: 78-78, 85-85, 104-104

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between daca179 and 1862edb.

📒 Files selected for processing (5)

.claude/hooks/.gitignore
.claude/hooks/config.sh.example
.claude/hooks/deployment-monitor.sh
.claude/skills/README.md
.claude/skills/SKILL.md

🚧 Files skipped from review as they are similar to previous changes (3)

.claude/skills/README.md
.claude/hooks/.gitignore
.claude/hooks/config.sh.example

🧰 Additional context used

🪛 Shellcheck (0.11.0)

.claude/hooks/deployment-monitor.sh

[warning] 67-67: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 78-78: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 85-85: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 104-104: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
GitHub Check: Red Hat Konflux / maestro-on-pull-request
GitHub Check: upgrade
GitHub Check: e2e-grpc-broker
GitHub Check: e2e-broadcast-subscription
GitHub Check: e2e-with-istio
GitHub Check: e2e

🔇 Additional comments (5)

.claude/skills/SKILL.md (2)

96-102: LGTM! Error handling for mktemp is properly implemented.

The validation now correctly checks that TEMP_DIR is non-empty and is a valid directory before proceeding with the clone operation. This addresses the previous review concern.

48-48: No action needed — go1.24.4 is a valid, released Go version.

go1.24.4 was released on June 5, 2025, and is a legitimate patch version of Go 1.24. There is no issue with this toolchain specification.

Likely an incorrect or invalid review comment.

.claude/hooks/deployment-monitor.sh (3)

221-230: LGTM! System notification escaping is now properly implemented.

The AppleScript message is now correctly escaped for quotes and backslashes, and notify-send uses -- to safely handle the message argument. This addresses the previous security concern.

152-162: LGTM! JSON construction with jq is properly implemented.

Using jq -n with --arg for string parameters ensures proper JSON escaping, preventing injection vulnerabilities. The fallback for systems without jq is also a good approach.

233-246: LGTM! Exit code propagation is now correct.

The monitor case properly propagates the exit code via exit $?, ensuring the script reflects the actual deployment outcome. This addresses the previous concern about masking timeout failures.

coderabbitai · 2026-01-05T07:35:26Z

.claude/hooks/deployment-monitor.sh

+if [ -f "$CONFIG_FILE" ]; then
+    # Validate config file security before sourcing
+    if [ -O "$CONFIG_FILE" ] || [ "$(stat -f %Su "$CONFIG_FILE" 2>/dev/null || stat -c %U "$CONFIG_FILE" 2>/dev/null)" = "$USER" ]; then
+        # Check if file is not world-writable
+        if [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "2" ] && \
+           [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "6" ]; then
+            # shellcheck disable=SC1090
+            source "$CONFIG_FILE"
+        else
+            echo "[$HOOK_NAME] WARNING: Skipping config.sh - file is world-writable (security risk)"
+        fi
+    else
+        echo "[$HOOK_NAME] WARNING: Skipping config.sh - file not owned by current user (security risk)"
+    fi
+fi


⚠️ Potential issue | 🟡 Minor

Security checks are good, but permission validation is incomplete.

The config file security checks address the earlier concerns. However, the world-writable check on lines 26-27 only tests for permission bits 2 and 6, missing 3 and 7 which also have write permission (e.g., rwx = 7, wx = 3).

🔎 Proposed fix

if [ -O "$CONFIG_FILE" ] || [ "$(stat -f %Su "$CONFIG_FILE" 2>/dev/null || stat -c %U "$CONFIG_FILE" 2>/dev/null)" = "$USER" ]; then # Check if file is not world-writable - if [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "2" ] && \ - [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "6" ]; then + local other_perm + other_perm=$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3) + # Bits 2, 3, 6, 7 all include write permission + if [[ ! "$other_perm" =~ ^[2367]$ ]]; then # shellcheck disable=SC1090 source "$CONFIG_FILE"

.claude/hooks/deployment-monitor.sh

coderabbitai · 2026-01-05T07:35:27Z

.claude/skills/SKILL.md

+export USER=oasis
+export PERSIST=true
+export GITHUB_ACTIONS=true
+export GOTOOLCHAIN=go1.24.4


🛠️ Refactor suggestion | 🟠 Major

Overwriting USER environment variable is risky.

Setting export USER=oasis overwrites the system's USER variable, which many tools rely on for user identification, home directory resolution, and permission checks. This could cause unexpected behavior in subsequent commands or scripts.

Consider using a different variable name:

🔎 Proposed fix

echo "Setting environment variables..." -export USER=oasis +export DEPLOY_USER=oasis export PERSIST=true export GITHUB_ACTIONS=true export GOTOOLCHAIN=go1.24.4 -echo "USER=$USER" +echo "DEPLOY_USER=$DEPLOY_USER" echo "PERSIST=$PERSIST"

Note: Verify with the ARO-HCP Makefile whether it specifically expects USER or if it can accept an alternative variable name.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export USER=oasis

export PERSIST=true

export GITHUB_ACTIONS=true

export GOTOOLCHAIN=go1.24.4

export DEPLOY_USER=oasis

export PERSIST=true

export GITHUB_ACTIONS=true

export GOTOOLCHAIN=go1.24.4

🤖 Prompt for AI Agents

In @.claude/skills/SKILL.md around lines 118-121, The env block currently overwrites USER (export USER=oasis) which is risky; change to a different variable name (e.g., OASIS_USER or SKILL_USER) and update any references that consume USER to use that new name instead (check where PERSIST, GITHUB_ACTIONS, and GOTOOLCHAIN are used); ensure builds or the ARO-HCP Makefile do not require the literal USER—if they do, conditionally set USER only when unset or map the new variable into the Makefile invocation instead.

coderabbitai

Actionable comments posted: 2

Fix all issues with AI Agents 🤖

In @.claude/hooks/deployment-monitor.sh:
- Around line 201-214: The curl call in the Slack send function can return exit
code 0 on HTTP 4xx/5xx, so update the command that posts "$payload" to
"$webhook_url" (the block that sets curl_exit_code) to use curl --fail (or
explicitly check the HTTP status via --write-out '%{http_code}') and treat
non-2xx responses as failures; ensure curl_exit_code is set non-zero when the
HTTP response is not 2xx and include HOOK_NAME and the HTTP status or error text
in the error echo so callers relying on curl_exit_code (and variables like
payload, webhook_url) know the notification was rejected.
- Around line 86-116: The loop watching for "$exit_code_file" can hang forever
because while true only breaks on that file; add a max wait/timeout using
start_time to compute elapsed and break with a non-zero exit (or log and exit)
when elapsed exceeds a configurable MAX_WAIT_SECONDS; update the loop logic in
deployment-monitor.sh around the while true block to (1) define and use
MAX_WAIT_SECONDS (or MAX_WAIT_MINUTES) near start_time, (2) check elapsed each
iteration and when exceeded echo a timed-out message including $HOOK_NAME and
exit with a distinct code, and (3) ensure any tail/grep progress still runs
until timeout so last_line_count, $output_file and progress prints remain
unchanged but the loop will not run indefinitely.

♻️ Duplicate comments (1)

.claude/hooks/deployment-monitor.sh (1)

22-36: Permission validation still incomplete for world-writable check.

The check rejects permission bits 2 and 6 but misses 3 (wx) and 7 (rwx), which also include write permission. A file with permissions xx3 or xx7 will bypass the security check and be sourced.

🔎 Proposed fix using regex match

     if [ -O "$CONFIG_FILE" ] || [ "$(stat -f %Su "$CONFIG_FILE" 2>/dev/null || stat -c %U "$CONFIG_FILE" 2>/dev/null)" = "$USER" ]; then
         # Check if file is not world-writable
-        if [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "2" ] && \
-           [ "$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | cut -c3)" != "6" ]; then
+        local other_perm
+        other_perm=$(stat -f %OLp "$CONFIG_FILE" 2>/dev/null || stat -c %a "$CONFIG_FILE" 2>/dev/null | tail -c 2 | head -c 1)
+        # Bits 2, 3, 6, 7 all include write permission
+        if [[ ! "$other_perm" =~ ^[2367]$ ]]; then
             # shellcheck disable=SC1090
             source "$CONFIG_FILE"

🧹 Nitpick comments (1)

.claude/hooks/deployment-monitor.sh (1)
240-248: Consider escaping $status in osascript command for defense in depth.

While $status is typically controlled ("COMPLETE"/"FAILED"), it could come from external input via the CLI ($0 notify <status> <message>). For consistency with the message escaping, consider escaping status as well.
🔎 Proposed fix
     # Also send system notification if available
     if command -v osascript &> /dev/null; then
         # macOS notification - escape message for AppleScript
         local safe_message="${message//\\/\\\\}"
         safe_message="${safe_message//\"/\\\"}"
-        osascript -e "display notification \"$safe_message\" with title \"Maestro Deployment $status\""
+        local safe_status="${status//\\/\\\\}"
+        safe_status="${safe_status//\"/\\\"}"
+        osascript -e "display notification \"$safe_message\" with title \"Maestro Deployment $safe_status\""
     elif command -v notify-send &> /dev/null; then

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1862edb and 4b0a858.

📒 Files selected for processing (5)

.claude/hooks/.gitignore
.claude/hooks/config.sh.example
.claude/hooks/deployment-monitor.sh
.claude/skills/README.md
.claude/skills/SKILL.md

🚧 Files skipped from review as they are similar to previous changes (4)

.claude/hooks/.gitignore
.claude/skills/SKILL.md
.claude/skills/README.md
.claude/hooks/config.sh.example

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
GitHub Check: Red Hat Konflux / maestro-on-pull-request
GitHub Check: e2e-broadcast-subscription
GitHub Check: e2e-with-istio
GitHub Check: upgrade
GitHub Check: e2e
GitHub Check: e2e-grpc-broker

🔇 Additional comments (3)

.claude/hooks/deployment-monitor.sh (3)

1-17: LGTM!

Clear header documentation with appropriate dependency listing. Using set -e for fail-fast behavior is appropriate for a monitoring script.

38-53: LGTM!

The check_command function correctly differentiates between required and optional dependencies with appropriate return codes.

251-264: LGTM!

The main execution block correctly handles both commands, propagates exit codes from monitor_deployment, and provides sensible defaults for the notify command.

.claude/hooks/deployment-monitor.sh

coderabbitai · 2026-01-05T07:52:41Z

.claude/hooks/deployment-monitor.sh

+    # Send to Slack and capture exit status
+    local curl_exit_code
+    if curl -X POST -H 'Content-type: application/json' \
+        --data "$payload" \
+        "$webhook_url" \
+        --silent --show-error; then
+        curl_exit_code=0
+    else
+        curl_exit_code=$?
+        echo "[$HOOK_NAME] ERROR: Failed to send Slack notification (curl exit code: $curl_exit_code)"
+    fi
+
+    return $curl_exit_code
+}


⚠️ Potential issue | 🟡 Minor

curl success doesn't guarantee notification was accepted.

Without --fail, curl returns exit code 0 even when the server responds with HTTP 4xx/5xx errors. A rejected webhook (bad URL, rate limited, etc.) would log "sent successfully" in the caller.

🔎 Proposed fix

# Send to Slack and capture exit status local curl_exit_code if curl -X POST -H 'Content-type: application/json' \ --data "$payload" \ "$webhook_url" \ - --silent --show-error; then + --silent --show-error --fail; then curl_exit_code=0 else

🤖 Prompt for AI Agents

In @.claude/hooks/deployment-monitor.sh around lines 201-214, The curl call in the Slack send function can return exit code 0 on HTTP 4xx/5xx, so update the command that posts "$payload" to "$webhook_url" (the block that sets curl_exit_code) to use curl --fail (or explicitly check the HTTP status via --write-out '%{http_code}') and treat non-2xx responses as failures; ensure curl_exit_code is set non-zero when the HTTP response is not 2xx and include HOOK_NAME and the HTTP status or error text in the error echo so callers relying on curl_exit_code (and variables like payload, webhook_url) know the notification was rejected.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

.claude/hooks/deployment-monitor.sh (2)
70-100: Add timeout protection to prevent infinite hang.

The while true loop only exits when exit_code_file appears. If the task crashes without creating this file, uses an incorrect task ID, or the process dies unexpectedly, the monitor will hang forever. This was flagged in previous reviews but remains unaddressed.
🔎 Proposed fix with timeout
     # Wait for the deployment to complete
     local last_line_count=0
+    local max_wait_seconds=${MONITOR_TIMEOUT:-7200}  # Default 2 hours
     while true; do
+        # Check for timeout
+        local elapsed=$(($(date +%s) - start_time))
+        if [ "$elapsed" -ge "$max_wait_seconds" ]; then
+            echo "[$HOOK_NAME] ERROR: Maximum wait time (${max_wait_seconds}s) reached"
+            notify_completion "FAILED" "Deployment monitoring timed out after ${max_wait_seconds} seconds"
+            return 2
+        fi
+
         # Check if exit code file exists (task completed)
         if [ -f "$exit_code_file" ]; then
185-198: Add --fail flag to curl for proper error detection.

Without the --fail flag, curl returns exit code 0 even when the server responds with HTTP 4xx/5xx errors. A rejected webhook (bad URL, rate limited, etc.) would incorrectly log "sent successfully" in the caller. This was flagged in a previous review but remains unaddressed.
🔎 Proposed fix
     # Send to Slack and capture exit status
     local curl_exit_code
     if curl -X POST -H 'Content-type: application/json' \
         --data "$payload" \
         "$webhook_url" \
-        --silent --show-error; then
+        --silent --show-error --fail; then
         curl_exit_code=0
     else
         curl_exit_code=$?

🧹 Nitpick comments (2)

.claude/hooks/deployment-monitor.sh (2)
15-15: Consider removing or relaxing set -e for graceful error handling.

The set -e flag causes the script to exit immediately on any command failure. For a monitoring script that should handle errors gracefully (e.g., failed notifications shouldn't abort monitoring), this may be too aggressive. Consider either removing it or handling specific commands with explicit error checks instead.

236-243: Consider showing usage when no arguments provided.

The script defaults to the notify command when no arguments are provided. Users might expect to see usage information instead. Consider changing the default case to show help rather than executing a notification.
🔎 Proposed change
 # Main execution
-case "${1:-notify}" in
+if [ $# -eq 0 ]; then
+    echo "Usage: $0 {monitor <task_id>|notify <status> <message>}"
+    exit 1
+fi
+
+case "$1" in
     monitor)
         monitor_deployment "$2"
         exit $?
         ;;
     notify)
-        notify_completion "${2:-COMPLETE}" "${3:-Deployment finished}"
+        notify_completion "$2" "$3"
         ;;

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4b0a858 and 02794db.

📒 Files selected for processing (3)

.claude/hooks/deployment-monitor.sh
.claude/skills/README.md
.claude/skills/SKILL.md

🚧 Files skipped from review as they are similar to previous changes (2)

.claude/skills/README.md
.claude/skills/SKILL.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Red Hat Konflux / maestro-on-pull-request
GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
GitHub Check: e2e-broadcast-subscription
GitHub Check: e2e-grpc-broker
GitHub Check: upgrade
GitHub Check: e2e
GitHub Check: e2e-with-istio

🔇 Additional comments (1)

.claude/hooks/deployment-monitor.sh (1)

224-232: Well done on the notification escaping!

The system notification handling properly escapes special characters for osascript and uses safe argument passing (--) for notify-send. This effectively prevents command injection vulnerabilities.

Signed-off-by: hchenxa <huichen@redhat.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

.claude/hooks/deployment-monitor.sh (1)
49-59: Consider hash-based path sanitization to prevent collisions.

The current path sanitization (tr '/' '-') could theoretically cause collisions where different paths map to the same sanitized string (e.g., /a/b/c and /a-b/c both become a-b-c). While unlikely in practice, using a hash would guarantee uniqueness.
🔎 Optional improvement using hash
-    # Build task output paths dynamically based on current working directory
-    local cwd_sanitized
-    cwd_sanitized=$(pwd | tr '/' '-' | sed 's/^-//')
-    local task_dir="/tmp/claude/-${cwd_sanitized}/tasks"
+    # Build task output paths dynamically based on current working directory
+    local cwd_hash
+    cwd_hash=$(pwd | shasum -a 256 | cut -d' ' -f1)
+    local task_dir="/tmp/claude/${cwd_hash}/tasks"

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 02794db and a17d74b.

📒 Files selected for processing (3)

.claude/hooks/deployment-monitor.sh
.claude/skills/README.md
.claude/skills/SKILL.md

🚧 Files skipped from review as they are similar to previous changes (1)

.claude/skills/README.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Red Hat Konflux / maestro-e2e-on-pull-request
GitHub Check: Red Hat Konflux / maestro-on-pull-request
GitHub Check: e2e-with-istio
GitHub Check: e2e-broadcast-subscription
GitHub Check: e2e-grpc-broker
GitHub Check: e2e
GitHub Check: upgrade

🔇 Additional comments (2)

.claude/skills/SKILL.md (1)

62-151: LGTM! Previous concerns have been addressed.

The implementation script shows good practices:

Proper error handling for mktemp with validation (lines 98-102)

Safe USER variable handling using ${USER:-oasis} pattern (line 119), which only sets the value if not already defined

Timeout protection for git clone (line 106)

Clear error messages and cleanup on failure

The bash script is well-structured and ready for use.

.claude/hooks/deployment-monitor.sh (1)

1-260: Excellent work addressing previous review feedback!

The script demonstrates significant improvements:

Dynamic path construction replaces hardcoded paths (lines 49-59)

Timeout protection prevents infinite loops (lines 72-82)

JSON injection mitigated with jq and proper fallback escaping (lines 163-194)

Command injection in notifications prevented with proper escaping (lines 238-243)

HTTP error handling with --fail flag (line 202)

Comprehensive error handling and progress reporting throughout

The monitoring hook is well-implemented and production-ready.

morvencao · 2026-01-06T07:46:10Z

.claude/skills/SKILL.md

@@ -0,0 +1,158 @@
+# Setup Maestro Long-Running Cluster


the current SKILL.md is missing the required YAML frontmatter. It needs to look like this:

--- name: setup-maestro-cluster description: Sets up a long-running Maestro cluster using Azure ARO-HCP --- # Setup Maestro Long-Running Cluster [rest of your documentation]

hchenxa force-pushed the hchenxa-2026 branch 2 times, most recently from 71b2ed6 to daca179 Compare January 5, 2026 07:06

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

.claude/hooks/deployment-monitor.sh Show resolved Hide resolved

.claude/hooks/deployment-monitor.sh Show resolved Hide resolved

.claude/skills/SKILL.md Show resolved Hide resolved

hchenxa force-pushed the hchenxa-2026 branch from daca179 to 1862edb Compare January 5, 2026 07:31

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

hchenxa force-pushed the hchenxa-2026 branch from 1862edb to 4b0a858 Compare January 5, 2026 07:48

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

hchenxa force-pushed the hchenxa-2026 branch from 4b0a858 to 02794db Compare January 5, 2026 07:54

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

add claude skills to create the maestro cluster

a17d74b

Signed-off-by: hchenxa <huichen@redhat.com>

hchenxa force-pushed the hchenxa-2026 branch from 02794db to a17d74b Compare January 5, 2026 08:04

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

morvencao reviewed Jan 6, 2026

View reviewed changes

add claude skills to create the maestro cluster #458

Are you sure you want to change the base?

add claude skills to create the maestro cluster #458

Conversation

hchenxa commented Jan 5, 2026

Uh oh!

coderabbitai bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

morvencao Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 5, 2026 •

edited

Loading