18660: chore: Add script to protect RC branches during the release#19
18660: chore: Add script to protect RC branches during the release#19martin-augment wants to merge 2 commits intomainfrom
Conversation
WalkthroughThis change updates the Apache DataFusion release infrastructure to support a new release series (50.x.y). The .asf.yaml file is simplified by removing commented-out branch protections and enabling auto-merge functionality. The release documentation is updated with references to branch-50 replacing branch-37, including updated commands, examples, and SVN artifact paths. A new executable Bash script ( ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Pull Request ReviewOverviewThis PR adds automation for branch protection during the release process by:
Code Quality ✅Strengths:
Shell Script Best Practices:
Potential Issues 🔍1. AWK Script Complexity (Medium Priority)Location: The AWK script has complex logic for inserting branch protection blocks. Concerns:
Suggestion: Consider simplifying by:
Example of the problematic section: for (j = i + 1; j <= NR; j++) {
i = j # This modifies outer loop variable!
if (j <= NR) print lines[j]
}2. GitHub API Rate Limiting (Low Priority)Location: The unauthenticated GitHub API call may hit rate limits (60 requests/hour): HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$GITHUB_API_URL")Suggestion: Consider adding authentication or making this check optional with a flag: # Use GITHUB_TOKEN if available for higher rate limits
if [ -n "$GITHUB_TOKEN" ]; then
HTTP_STATUS=$(curl -s -H "Authorization: token $GITHUB_TOKEN" -o /dev/null -w "%{http_code}" "$GITHUB_API_URL")
else
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$GITHUB_API_URL")
fi3. Inconsistent Shebang (Minor)Location: New script uses Recommendation: Keep Documentation 📝Strengths:
Minor Issue: # these branches protection blocks autogenerated during release processShould be: # these branch protection blocks are autogenerated during release processSecurity Concerns 🔒No major security issues identified. The script:
Performance Considerations ⚡No performance concerns. This is a one-time manual operation run during release preparation. The script:
Test Coverage 🧪Missing: The script has no automated tests. Recommendations:
Example test structure: # dev/release/test-add-branch-protection.sh
#!/usr/bin/env bash
set -e
# Test 1: Invalid input
if ./dev/release/add-branch-protection.sh "abc" 2>/dev/null; then
echo "FAIL: Should reject non-numeric input"
exit 1
fi
# Test 2: Duplicate detection
# Create temp .asf.yaml with branch-99
# Run script for branch-99, expect failure
# etc.Breaking Changes ❌None. This is purely additive functionality that:
Existing workflows continue to work. Recommendations SummaryMust Fix:
Should Fix:
Nice to Have:
ConclusionThis is a valuable improvement to the release process that reduces manual work and potential for errors. The core functionality is sound, but the AWK script has a subtle bug that should be fixed before merging. The lack of tests is concerning for a script that modifies critical infrastructure configuration. Overall Assessment: Approve with minor changes requested ✅ The double-loop issue in the AWK script should be addressed, and the grammar fix would be nice. Consider adding tests in a follow-up PR. |
| for (i = last_branch_line + 1; i <= NR; i++) { | ||
| print lines[i] | ||
| # After printing the required_approving_review_count line, insert new branch | ||
| if (lines[i] ~ /required_approving_review_count:/) { |
There was a problem hiding this comment.
This insertion relies on encountering required_approving_review_count: and peeking only one line ahead; if additional keys are added under the last branch block after this line, the insertion never triggers and the script exits with “Failed to add branch protection block” (also applies to future YAML layout changes).
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
dev/release/add-branch-protection.sh (2)
91-135: AWK script logic is complex and fragile; consider simplification.The AWK script for finding the insertion point and modifying .asf.yaml is clever but overly complex, with multiple nested loops and indentation-based detection. While it appears functionally correct, it has several fragility concerns:
Hardcoded indentation levels (4, 6, 8 spaces): YAML best practices recommend using two spaces per indentation level, though the current file uses 4 spaces. Any future YAML reformatting could break the script.
Nested loop logic (lines 126-129): The inner
forloop that iterates to print remaining lines and modifiesiis confusing and hard to follow.Indentation-based parsing: Relying on specific indentation patterns to detect structure boundaries is inherently fragile.
Consider simplifying by using a more straightforward approach:
- Use line-number-based insertion (append before
pull_requests:line)- Or validate the YAML structure with
yamllintoryqbefore/after modification- Or use a proper YAML parser tool instead of AWK
This is a recommended refactor for maintainability.
143-148: Verification step is good, but doesn't catch malformed YAML.The script verifies that the new branch name was added to the file, but doesn't validate that the resulting YAML is still valid. If the insertion introduced formatting issues, the verification would pass but the file would be corrupted. Consider adding a validation step after insertion.
# After line 148, add: if ! command -v yamllint &> /dev/null; then # Skip validation if yamllint not available, but warn user echo "Warning: yamllint not found. Consider validating the modified .asf.yaml manually." else if ! yamllint "$TEMP_FILE" > /dev/null 2>&1; then echo "Error: Modified .asf.yaml failed YAML validation" rm -f "$TEMP_FILE" exit 1 fi fidev/release/README.md (1)
1-203: Documentation provides good context but could clarify maintenance burden.The PR introduces useful automation to reduce manual YAML editing for each release. However, the documentation could briefly note:
- Who is responsible for running the script (release managers/PMC members)
- When to run it (during initial release branch creation setup, step 1)
- That the script requires write access to the repository
This would reduce confusion for new release managers following these instructions.
Consider adding a brief note at the start of section "2. Add a protection..." clarifying that this script must be run from the repository root by someone with commit access, and that the changes will be reviewed in a PR before merging.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.asf.yaml(1 hunks)dev/release/README.md(5 hunks)dev/release/add-branch-protection.sh(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: claude-review
- GitHub Check: Analyze (rust)
🔇 Additional comments (7)
.asf.yaml (2)
54-68: Configuration changes look good.The simplification of .asf.yaml by removing commented-out branch protections and adding auto-merge support aligns well with the automation introduced by the new
add-branch-protection.shscript. Branch protections for branch-50 and branch-51 are properly retained.
54-58: Anchor reference resolves correctly—no issue.GitHub uses the github-slugger algorithm to convert heading text to anchor IDs by lowercasing, removing punctuation, and replacing spaces with hyphens while preserving numbers. For the heading "### 2. Add a protection to release candidate branch," this produces exactly
#2-add-a-protection-to-release-candidate-branch, matching the referenced anchor. The comment link will work as intended.dev/release/add-branch-protection.sh (2)
52-84: Input validation and branch verification are thorough.The script properly validates the release number format, checks for .asf.yaml existence, and verifies the branch exists on GitHub before proceeding. Error handling with clear messages and proper exit codes is well done.
67-78: GitHub API call uses unauthenticated requests with rate limit considerations.The script makes an unauthenticated GitHub API call, which is subject to a rate limit of 60 requests per hour. For a release process that runs infrequently (at most a few times per month per maintainer), this is acceptable. If release processes become more frequent, consider allowing an optional
GH_TOKENenvironment variable to authenticate and benefit from higher rate limits.dev/release/README.md (3)
29-29: Release lineage updates from branch-37 to branch-50 are comprehensive and consistent.All references to the previous release series have been appropriately updated with concrete examples for the 50.x.y lineage. The backport instructions at lines 60-68 are clear and actionable.
Also applies to: 33-33, 58-58, 60-68
139-157: New branch protection workflow section is well-structured and clear.The addition of section "2. Add a protection to release candidate branch" clearly explains the automation workflow. The example command, expected output, and follow-up steps (create PR, merge to main) are well-documented. This guides maintainers to properly code-review the .asf.yaml changes before merging.
427-427: SVN artifact deletion examples updated appropriately.The examples for cleaning up old release candidates and releases have been updated from 38.0.0/37.0.0 to 50.0.0/50.3.0, reflecting the new release series and maintaining consistency with the branch-50 workflow.
Also applies to: 443-443
18660: To review by AI