Skip to content

fix(framework): Move Token deletion from PushAppOutputs into UpdateRunStatus#6924

Open
msheller wants to merge 3 commits intomainfrom
fix-token-deletion-for-reliable-run-finalization
Open

fix(framework): Move Token deletion from PushAppOutputs into UpdateRunStatus#6924
msheller wants to merge 3 commits intomainfrom
fix-token-deletion-for-reliable-run-finalization

Conversation

@msheller
Copy link
Copy Markdown
Member

@msheller msheller commented Apr 1, 2026

Issue

Fixes a run-finalization sequencing bug in ServerAppIo by keeping app tokens alive until terminal status is successfully persisted.

Prior to this fix, PushAppOutputs deletes the token before the final UpdateRunStatus(FINISHED, ...) call. If that terminal status update fails during shutdown, the token is already gone, and heartbeat-expiry fallback can not finalize the run.

Proposal

Changes:

  • ServerAppIo.PushAppOutputs no longer deletes token.
  • ServerAppIo.UpdateRunStatus now:
    • Checks state.update_run_status(...) result.
    • Returns FAILED_PRECONDITION if transition is rejected.
    • Deletes token only after successful FINISHED persistence (after existing object-store cleanup).
  • Added concise comments clarifying the sequencing rationale.

Updated serverappio_servicer_test.py with focused regressions:

  • Token is retained after successful PushAppOutputs.
  • Invalid UpdateRunStatus transition returns FAILED_PRECONDITION and keeps token.
  • Successful RUNNING -> FINISHED(COMPLETED) deletes token.

Checklist

  • Implement proposed change
  • Write tests
  • Update documentation
  • Address LLM-reviewer comments, if applicable (e.g., GitHub Copilot)
  • Make CI checks pass
  • Ping maintainers on Slack (channel #contributions)

@msheller msheller marked this pull request as ready for review April 1, 2026 21:16
Copilot AI review requested due to automatic review settings April 1, 2026 21:16
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 93f4a9570e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread framework/py/flwr/server/superlink/serverappio/serverappio_servicer.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a run-finalization sequencing issue by retaining ServerApp tokens until a terminal run status (FINISHED) has been successfully persisted, enabling heartbeat-expiry fallback finalization if shutdown status updates fail.

Changes:

  • Stop deleting the run token in ServerAppIoServicer.PushAppOutputs.
  • In ServerAppIoServicer.UpdateRunStatus, check the result of state.update_run_status(...), return FAILED_PRECONDITION on invalid transitions, and delete the token after a successful FINISHED update.
  • Add regression tests to validate token retention after PushAppOutputs, token retention on invalid status transitions, and token deletion on successful FINISHED transitions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
framework/py/flwr/server/superlink/serverappio/serverappio_servicer.py Moves token deletion responsibility to the FINISHED status update path and adds transition failure handling.
framework/py/flwr/server/superlink/serverappio/serverappio_servicer_test.py Adds/updates tests to enforce the new token lifecycle and invalid-transition behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread framework/py/flwr/server/superlink/serverappio/serverappio_servicer.py Outdated
Comment thread framework/py/flwr/server/superlink/serverappio/serverappio_servicer.py Outdated
@github-actions github-actions bot added the Maintainer Used to determine what PRs (mainly) come from Flower maintainers. label Apr 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread framework/py/flwr/server/superlink/serverappio/serverappio_servicer.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 581c77b7f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +390 to 392
# Keep token until terminal status is committed. If shutdown finalization fails,
# heartbeat expiry still needs the token to trigger run finalization fallback.
return PushAppOutputsResponse()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Delete token before entering shutdown cleanup

Keeping the token alive in PushAppOutputs creates a new failure window during normal shutdown: on_exit in framework/py/flwr/server/serverapp/app.py stops the heartbeat thread before it calls UpdateRunStatus, so if log upload/join or network teardown delays finalization for ~60s, token cleanup can mark the still-successful run as FINISHED/FAILED before UpdateRunStatus(FINISHED/COMPLETED) arrives. At that point abort_if(...[Status.FINISHED]...) rejects the completion update, so a successful run is persisted as failed under slow shutdown conditions.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Maintainer Used to determine what PRs (mainly) come from Flower maintainers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants