Skip to content

18660: chore: Add script to protect RC branches during the release#19

Open
martin-augment wants to merge 2 commits intomainfrom
pr-18660-2025-11-13-06-56-33
Open

18660: chore: Add script to protect RC branches during the release#19
martin-augment wants to merge 2 commits intomainfrom
pr-18660-2025-11-13-06-56-33

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

18660: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 13, 2025

Walkthrough

This change updates the Apache DataFusion release infrastructure to support a new release series (50.x.y). The .asf.yaml file is simplified by removing commented-out branch protections and enabling auto-merge functionality. The release documentation is updated with references to branch-50 replacing branch-37, including updated commands, examples, and SVN artifact paths. A new executable Bash script (add-branch-protection.sh) is introduced to automate adding branch protection entries to .asf.yaml for future release branches, and the release process documentation is restructured to reflect this automation workflow.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-18660-2025-11-13-06-56-33

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Nov 13, 2025

Pull Request Review

Overview

This PR adds automation for branch protection during the release process by:

  1. Creating a new script add-branch-protection.sh to automate adding branch protection rules
  2. Updating .asf.yaml to remove commented-out placeholder branch protections (52-70)
  3. Updating documentation in dev/release/README.md to reflect the new workflow

Code Quality ✅

Strengths:

  • Well-structured bash script with proper error handling (set -e)
  • Comprehensive input validation (checks for arguments, positive integers, file existence)
  • Good user feedback with clear error messages
  • Proper Apache license headers on all files
  • Executable permissions correctly set on the script

Shell Script Best Practices:

  • Uses #!/usr/bin/env bash (good for portability)
  • Proper variable quoting throughout
  • Creates temporary files safely with mktemp
  • Cleans up on errors (rm -f "$TEMP_FILE")

Potential Issues 🔍

1. AWK Script Complexity (Medium Priority)

Location: dev/release/add-branch-protection.sh:91-135

The AWK script has complex logic for inserting branch protection blocks. Concerns:

  • Lines 126-129: The double loop could cause issues - it iterates i inside a loop already iterating i, which may cause unexpected behavior
  • The logic for finding the insertion point relies on indentation detection (/^[[:space:]]{6,}/) which is fragile
  • If .asf.yaml formatting changes, this could break

Suggestion: Consider simplifying by:

  • Using a more straightforward approach: find last branch-XX: and its associated required_approving_review_count: 1, then insert after
  • Or use a YAML parser (yq/python) for more robust YAML manipulation

Example of the problematic section:

for (j = i + 1; j <= NR; j++) {
    i = j  # This modifies outer loop variable!
    if (j <= NR) print lines[j]
}

2. GitHub API Rate Limiting (Low Priority)

Location: dev/release/add-branch-protection.sh:68-77

The unauthenticated GitHub API call may hit rate limits (60 requests/hour):

HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$GITHUB_API_URL")

Suggestion: Consider adding authentication or making this check optional with a flag:

# Use GITHUB_TOKEN if available for higher rate limits
if [ -n "$GITHUB_TOKEN" ]; then
    HTTP_STATUS=$(curl -s -H "Authorization: token $GITHUB_TOKEN" -o /dev/null -w "%{http_code}" "$GITHUB_API_URL")
else
    HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$GITHUB_API_URL")
fi

3. Inconsistent Shebang (Minor)

Location: dev/release/add-branch-protection.sh:1 vs dev/release/create-tarball.sh:1

New script uses #!/usr/bin/env bash while existing script uses #!/bin/bash. While #!/usr/bin/env bash is more portable, consistency across the codebase is valuable.

Recommendation: Keep #!/usr/bin/env bash as it's the better practice, but consider updating other scripts later for consistency.

Documentation 📝

Strengths:

  • Excellent inline documentation in the script
  • Clear usage examples
  • README properly updated to reflect new workflow order (branch protection moved from step 3 to step 2)
  • Updated all example version numbers consistently (37→50, 38→50, etc.)

Minor Issue:
The comment in .asf.yaml:57-58 has a typo/grammatical issue:

# these branches protection blocks autogenerated during release process

Should be:

# these branch protection blocks are autogenerated during release process

Security Concerns 🔒

No major security issues identified. The script:

  • ✅ Doesn't accept arbitrary input for shell execution
  • ✅ Properly validates input (numeric only)
  • ✅ Uses safe temporary file creation
  • ✅ Checks for file existence before operations
  • ✅ Verifies changes were applied before committing

Performance Considerations ⚡

No performance concerns. This is a one-time manual operation run during release preparation. The script:

  • Reads .asf.yaml once into memory (small file)
  • Makes a single GitHub API call
  • Operations complete in milliseconds

Test Coverage 🧪

Missing: The script has no automated tests.

Recommendations:

  1. Add a test script that:

    • Tests input validation (invalid numbers, missing args)
    • Tests duplicate detection
    • Tests insertion logic with a mock .asf.yaml
    • Tests error cases (missing file, no existing branches)
  2. Consider adding integration test:

    • Run script on a test .asf.yaml
    • Verify output structure with YAML validator
    • Check idempotency (running twice should fail appropriately)

Example test structure:

# dev/release/test-add-branch-protection.sh
#!/usr/bin/env bash
set -e

# Test 1: Invalid input
if ./dev/release/add-branch-protection.sh "abc" 2>/dev/null; then
    echo "FAIL: Should reject non-numeric input"
    exit 1
fi

# Test 2: Duplicate detection
# Create temp .asf.yaml with branch-99
# Run script for branch-99, expect failure
# etc.

Breaking Changes ❌

None. This is purely additive functionality that:

  • Adds a new optional script
  • Removes unused commented code
  • Updates documentation

Existing workflows continue to work.

Recommendations Summary

Must Fix:

  • Fix the AWK double-loop issue (lines 126-129) - this could cause incorrect output

Should Fix:

  • Fix grammar in .asf.yaml comment (line 57)
  • Add GitHub token support for API calls to avoid rate limiting

Nice to Have:

  • Add automated tests for the script
  • Simplify AWK logic or use a YAML parser
  • Add dry-run mode (--dry-run flag) to preview changes

Conclusion

This is a valuable improvement to the release process that reduces manual work and potential for errors. The core functionality is sound, but the AWK script has a subtle bug that should be fixed before merging. The lack of tests is concerning for a script that modifies critical infrastructure configuration.

Overall Assessment: Approve with minor changes requested

The double-loop issue in the AWK script should be addressed, and the grammar fix would be nice. Consider adding tests in a follow-up PR.

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Comment augment review to trigger a new review at any time.

for (i = last_branch_line + 1; i <= NR; i++) {
print lines[i]
# After printing the required_approving_review_count line, insert new branch
if (lines[i] ~ /required_approving_review_count:/) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This insertion relies on encountering required_approving_review_count: and peeking only one line ahead; if additional keys are added under the last branch block after this line, the insertion never triggers and the script exits with “Failed to add branch protection block” (also applies to future YAML layout changes).

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
dev/release/add-branch-protection.sh (2)

91-135: AWK script logic is complex and fragile; consider simplification.

The AWK script for finding the insertion point and modifying .asf.yaml is clever but overly complex, with multiple nested loops and indentation-based detection. While it appears functionally correct, it has several fragility concerns:

  1. Hardcoded indentation levels (4, 6, 8 spaces): YAML best practices recommend using two spaces per indentation level, though the current file uses 4 spaces. Any future YAML reformatting could break the script.

  2. Nested loop logic (lines 126-129): The inner for loop that iterates to print remaining lines and modifies i is confusing and hard to follow.

  3. Indentation-based parsing: Relying on specific indentation patterns to detect structure boundaries is inherently fragile.

Consider simplifying by using a more straightforward approach:

  • Use line-number-based insertion (append before pull_requests: line)
  • Or validate the YAML structure with yamllint or yq before/after modification
  • Or use a proper YAML parser tool instead of AWK

This is a recommended refactor for maintainability.


143-148: Verification step is good, but doesn't catch malformed YAML.

The script verifies that the new branch name was added to the file, but doesn't validate that the resulting YAML is still valid. If the insertion introduced formatting issues, the verification would pass but the file would be corrupted. Consider adding a validation step after insertion.

# After line 148, add:
if ! command -v yamllint &> /dev/null; then
    # Skip validation if yamllint not available, but warn user
    echo "Warning: yamllint not found. Consider validating the modified .asf.yaml manually."
else
    if ! yamllint "$TEMP_FILE" > /dev/null 2>&1; then
        echo "Error: Modified .asf.yaml failed YAML validation"
        rm -f "$TEMP_FILE"
        exit 1
    fi
fi
dev/release/README.md (1)

1-203: Documentation provides good context but could clarify maintenance burden.

The PR introduces useful automation to reduce manual YAML editing for each release. However, the documentation could briefly note:

  • Who is responsible for running the script (release managers/PMC members)
  • When to run it (during initial release branch creation setup, step 1)
  • That the script requires write access to the repository

This would reduce confusion for new release managers following these instructions.

Consider adding a brief note at the start of section "2. Add a protection..." clarifying that this script must be run from the repository root by someone with commit access, and that the changes will be reviewed in a PR before merging.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c27d3f3 and a50c860.

📒 Files selected for processing (3)
  • .asf.yaml (1 hunks)
  • dev/release/README.md (5 hunks)
  • dev/release/add-branch-protection.sh (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: claude-review
  • GitHub Check: Analyze (rust)
🔇 Additional comments (7)
.asf.yaml (2)

54-68: Configuration changes look good.

The simplification of .asf.yaml by removing commented-out branch protections and adding auto-merge support aligns well with the automation introduced by the new add-branch-protection.sh script. Branch protections for branch-50 and branch-51 are properly retained.


54-58: Anchor reference resolves correctly—no issue.

GitHub uses the github-slugger algorithm to convert heading text to anchor IDs by lowercasing, removing punctuation, and replacing spaces with hyphens while preserving numbers. For the heading "### 2. Add a protection to release candidate branch," this produces exactly #2-add-a-protection-to-release-candidate-branch, matching the referenced anchor. The comment link will work as intended.

dev/release/add-branch-protection.sh (2)

52-84: Input validation and branch verification are thorough.

The script properly validates the release number format, checks for .asf.yaml existence, and verifies the branch exists on GitHub before proceeding. Error handling with clear messages and proper exit codes is well done.


67-78: GitHub API call uses unauthenticated requests with rate limit considerations.

The script makes an unauthenticated GitHub API call, which is subject to a rate limit of 60 requests per hour. For a release process that runs infrequently (at most a few times per month per maintainer), this is acceptable. If release processes become more frequent, consider allowing an optional GH_TOKEN environment variable to authenticate and benefit from higher rate limits.

dev/release/README.md (3)

29-29: Release lineage updates from branch-37 to branch-50 are comprehensive and consistent.

All references to the previous release series have been appropriately updated with concrete examples for the 50.x.y lineage. The backport instructions at lines 60-68 are clear and actionable.

Also applies to: 33-33, 58-58, 60-68


139-157: New branch protection workflow section is well-structured and clear.

The addition of section "2. Add a protection to release candidate branch" clearly explains the automation workflow. The example command, expected output, and follow-up steps (create PR, merge to main) are well-documented. This guides maintainers to properly code-review the .asf.yaml changes before merging.


427-427: SVN artifact deletion examples updated appropriately.

The examples for cleaning up old release candidates and releases have been updated from 38.0.0/37.0.0 to 50.0.0/50.3.0, reflecting the new release series and maintaining consistency with the branch-50 workflow.

Also applies to: 443-443

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants