Skip to content

ci: gh action release automation#2203

Draft
SimonRastikian wants to merge 26 commits intomainfrom
simon/gh-action-release-automation
Draft

ci: gh action release automation#2203
SimonRastikian wants to merge 26 commits intomainfrom
simon/gh-action-release-automation

Conversation

@SimonRastikian
Copy link
Copy Markdown
Contributor

@SimonRastikian SimonRastikian commented Feb 20, 2026

Closes #1971 .

This should be combined with PR github.com//pull/2200 for a full automation with little manual setup.

This PR triggers a GH action on main:

  • If the merged PR comes from a branch named release/vX.X.X (with proper semver) it launches the task
  • The task first creates a tag with the proper version
  • It then builds the three necessary docker images
  • It collects the ID and Digest from each
  • It builds the contract and renames it
  • It collects the hash value of the contract
  • It makes a tar file of it
  • It creates a draft release with the changelog pulled from CHANGELOG.md
  • It adds all the ID and digest and hash values in the draft.

Only real problem is that I do not see a proper way of testing this without merging into main and without merging a dummy release/v0.0.0 PR.
Any suggestions?

@claude
Copy link
Copy Markdown

claude bot commented Feb 20, 2026

PR title type suggestion: This PR adds a GitHub Actions workflow for release automation, which is a CI/CD pipeline change. The type prefix should probably be ci: instead of build:.

Suggested title: ci: add release automation workflow

@claude
Copy link
Copy Markdown

claude bot commented Feb 20, 2026

Code Review

Critical Issues

  1. Race condition: Fixed 60-second sleep is unreliable for waiting on dispatched workflows (.github/workflows/release.yml:130-131)

    The retag-docker-images job dispatches three workflow runs via gh workflow run and then does a fixed sleep 60 before inspecting the images with skopeo. These dispatched workflows need to: be queued, get a runner, pull/push Docker images. If any of these take longer than 60 seconds (which is plausible under load or with large images), the skopeo inspect step will either fail or return stale/missing data — silently producing incorrect digests in the release notes.

    Suggested fix: Use gh run watch or poll gh run list --workflow=<name> until all three dispatched runs reach completed status before proceeding. Example approach:

    # After dispatching, find the run ID and wait
    sleep 5  # brief pause for the run to appear
    RUN_ID=$(gh run list --workflow=docker_node_release.yml --limit=1 --json databaseId -q '.[0].databaseId')
    gh run watch "$RUN_ID" --exit-status
  2. skopeo inspect --raw may not return config.digest as Image ID (.github/workflows/release.yml:143)

    For multi-arch manifest lists (OCI index), skopeo inspect --raw returns the manifest list, not a single image manifest. The .config.digest field exists on image manifests, not on manifest lists. If the images are multi-arch, jq -r '.config.digest' will return null. Even for single-arch images, --raw returns the raw manifest (with layers and config), so .config.digest should work — but this assumption should be validated. If images could be multi-arch, you'll need to resolve the platform-specific manifest first.

  3. No version format validation (.github/workflows/release.yml:52)

    The version is extracted with a simple ${BRANCH#release/v} substitution. A branch like release/vfoo-bar would produce version foo-bar, tag vfoo-bar, and the workflow would proceed to create a tag, trigger Docker retags, and create a draft release with an invalid version. Consider adding a regex check:

    if [[ ! "$VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+(-[a-zA-Z0-9.]+)?$ ]]; then
      echo "::error::Invalid semver: $VERSION"
      exit 1
    fi

Minor Issues

  • Dispatched workflows not checked for success (.github/workflows/release.yml:104-126): Even after waiting, the workflow doesn't verify whether the dispatched retag workflows succeeded. If a retag fails (e.g., source image doesn't exist), the release job proceeds with empty/null digest values. The gh run watch --exit-status approach above would also address this.

  • release-notes-raw.md could be empty (.github/workflows/release.yml:237): If the version header isn't found in CHANGELOG.md (e.g., it wasn't updated as part of the release PR), awk produces an empty file. The release would be created with only Docker/contract info and no changelog. A check would prevent accidental empty releases.

⚠️ Issues found — the hardcoded sleep race condition (#1) is the most significant, as it can produce incorrect release artifacts.

@claude
Copy link
Copy Markdown

claude bot commented Feb 20, 2026

PR title type suggestion: This PR modifies only a GitHub Actions workflow file, so the type prefix should be ci: instead of build:.

Suggested title: ci: gh action release automation

@claude
Copy link
Copy Markdown

claude bot commented Feb 20, 2026

PR title type suggestion: This PR changes only CI/CD workflow files, so the type prefix should probably be ci: instead of build:.
Suggested title: ci: gh action release automation

@SimonRastikian SimonRastikian changed the title build: gh action release automation ci: gh action release automation Feb 20, 2026
@SimonRastikian
Copy link
Copy Markdown
Contributor Author

@claude review this PR

@claude
Copy link
Copy Markdown

claude bot commented Feb 20, 2026

Code Review

The previous review's critical issues (#1 race condition with sleep, #2 skopeo multi-arch, #3 version validation) have all been addressed in the current version — semver validation is present, gh run watch --exit-status replaces the fixed sleep, --override-os/--override-arch handles multi-arch manifests, and the empty changelog check is now a hard failure. Good improvements.

New Critical Issue

docker_node_release.yml concurrency group will cancel the first dispatch when the second is triggered (lines 115-140)

The docker_node_release.yml workflow has:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

The release workflow dispatches this same workflow twice in sequence:

  1. First for mpc-node-gcp (line 115)
  2. Then for mpc-node (line 128)

Both workflow_dispatch calls target the same workflow on the same default branch ref. This means they share the same concurrency group. When the second dispatch starts, cancel-in-progress: true will cancel the first run (the mpc-node-gcp retag).

The result: mpc-node-gcp may not get retagged, the captured run_id for node-gcp may point to a cancelled run, and gh run watch would report failure — or worse, the mpc-node-gcp retag silently doesn't happen and the digest step returns stale data.

Fix options:

  • (Preferred) Change the concurrency group in docker_node_release.yml to include the repository input so the two dispatches don't collide:
    concurrency:
      group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event.inputs.repository }}
      cancel-in-progress: true
  • Or dispatch them sequentially with gh run watch between each, so the first completes before the second starts. But this is slower and the concurrency fix is more robust.

Minor Issues

  • Run ID capture race (lines 107, 122, 136): gh run list --limit=1 after sleep 5 assumes the most recent run for that workflow is the one we just dispatched. If another actor dispatches the same workflow in that 5-second window, you'd capture the wrong run ID. This is unlikely but could be made more robust by filtering on the headBranch or by using --json databaseId,createdAt and matching against the dispatch time. Low risk but worth noting.

  • skopeo inspect without Docker Hub auth (line 164): The retag-docker-images job doesn't log into Docker Hub before running skopeo inspect. If any of the nearone/* images are in a private registry or if Docker Hub rate limiting kicks in, the inspect will fail. The dispatched retag workflows do log in, but this parent job doesn't inherit that. Consider adding a Docker login step before the digest collection.

⚠️ The concurrency cancellation issue is a merge blocker — it will cause one of the two docker_node_release.yml dispatches to be cancelled on every release.

@SimonRastikian
Copy link
Copy Markdown
Contributor Author

@claude review this PR

@claude
Copy link
Copy Markdown

claude bot commented Feb 20, 2026

Code Review

Previous critical issues (sleep race condition, concurrency cancellation, version validation, empty changelog handling, multi-arch skopeo) have all been addressed. The concurrency group fix in docker_node_release.yml correctly differentiates the two dispatches by including the repository input.

Remaining Issues

  1. Missing Docker Hub authentication for skopeo inspect (release.yml, Get image digests step)

    The retag-docker-images job runs skopeo inspect against docker://nearone/* on Docker Hub without logging in. The child retag workflows authenticate, but this parent job does not. Docker Hub enforces anonymous pull rate limits (100 per 6 hours per IP for unauthenticated requests). Since this step makes 6 registry calls (2 per image x 3 images), a shared runner IP that has other Docker Hub traffic could hit this limit, causing the release to fail intermittently.

    Fix: Add a Docker login step before the digest collection:

    - name: Login to Docker Hub
      uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0
      with:
        username: ${{ secrets.DOCKERHUB_USERNAME }}
        password: ${{ secrets.DOCKERHUB_TOKEN }}
  2. Run ID capture could pick up wrong workflow run (release.yml, lines ~107, ~122, ~136)

    After gh workflow run + sleep 5, gh run list --limit=1 grabs the most recent run for that workflow. If another dispatch (manual or automated) happens in that window, the wrong run ID is captured. Low probability, but the consequence is waiting on (and reporting digests from) the wrong build. A more robust approach would be to filter by headBranch or event, or to compare timestamps:

    DISPATCH_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ)
    gh workflow run ...
    sleep 5
    RUN_ID=$(gh run list --workflow=docker_node_release.yml --repo "$REPO" \
      --json databaseId,createdAt -q \
      "[.[] | select(.createdAt >= \"$DISPATCH_TIME\")] | .[0].databaseId")

Neither of these is a hard merge blocker, but #1 is worth addressing to avoid flaky releases.

✅ Approved — the critical issues from prior reviews have been fixed. The two items above are recommendations for reliability.

@netrome
Copy link
Copy Markdown
Collaborator

netrome commented Feb 23, 2026

Only real problem is that I do not see a proper way of testing this without merging into main and without merging a dummy release/v0.0.0 PR.
Any suggestions?

Unfortunately I don't have any suggestions. This is a big problem with GitHub Actions imo.

@netrome
Copy link
Copy Markdown
Collaborator

netrome commented Feb 23, 2026

If the merged PR comes from a branch named release/vX.X.X (with proper semver) it launches the task
The task first creates a tag with the proper version.

This feels unnecessary to me. I'd rather have the developer create the tag, and the workflow can be triggered once the tag is pushed.

Automatically triggering the release on merge can be quite risky since we're not fully controlling the merge order of PRs due to the merge queue. There's a risk other changes gets merged before, at which point the release changelog would be wrong and we could have unintended behavior in it.

Copy link
Copy Markdown
Collaborator

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly requested the change we discussed. Will review this deeper once that's updated.

@SimonRastikian SimonRastikian requested review from gilcu3 and netrome March 2, 2026 09:18
@SimonRastikian
Copy link
Copy Markdown
Contributor Author

@netrome @gilcu3

Copy link
Copy Markdown
Collaborator

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag filter should be updated

Copy link
Copy Markdown
Contributor

@gilcu3 gilcu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to install cargo-near

Comment on lines +230 to +231
- name: Build contract
run: cargo near build reproducible-wasm --manifest-path crates/contract/Cargo.toml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will this work without installing cargo-near first?

@SimonRastikian SimonRastikian self-assigned this Mar 9, 2026
@SimonRastikian SimonRastikian marked this pull request as draft March 19, 2026 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create an automated bash script for the Release

3 participants