fix(framework): Move Token deletion from PushAppOutputs into UpdateRunStatus#6924
fix(framework): Move Token deletion from PushAppOutputs into UpdateRunStatus#6924
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 93f4a9570e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Fixes a run-finalization sequencing issue by retaining ServerApp tokens until a terminal run status (FINISHED) has been successfully persisted, enabling heartbeat-expiry fallback finalization if shutdown status updates fail.
Changes:
- Stop deleting the run token in
ServerAppIoServicer.PushAppOutputs. - In
ServerAppIoServicer.UpdateRunStatus, check the result ofstate.update_run_status(...), returnFAILED_PRECONDITIONon invalid transitions, and delete the token after a successful FINISHED update. - Add regression tests to validate token retention after
PushAppOutputs, token retention on invalid status transitions, and token deletion on successful FINISHED transitions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| framework/py/flwr/server/superlink/serverappio/serverappio_servicer.py | Moves token deletion responsibility to the FINISHED status update path and adds transition failure handling. |
| framework/py/flwr/server/superlink/serverappio/serverappio_servicer_test.py | Adds/updates tests to enforce the new token lifecycle and invalid-transition behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 581c77b7f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Keep token until terminal status is committed. If shutdown finalization fails, | ||
| # heartbeat expiry still needs the token to trigger run finalization fallback. | ||
| return PushAppOutputsResponse() |
There was a problem hiding this comment.
Delete token before entering shutdown cleanup
Keeping the token alive in PushAppOutputs creates a new failure window during normal shutdown: on_exit in framework/py/flwr/server/serverapp/app.py stops the heartbeat thread before it calls UpdateRunStatus, so if log upload/join or network teardown delays finalization for ~60s, token cleanup can mark the still-successful run as FINISHED/FAILED before UpdateRunStatus(FINISHED/COMPLETED) arrives. At that point abort_if(...[Status.FINISHED]...) rejects the completion update, so a successful run is persisted as failed under slow shutdown conditions.
Useful? React with 👍 / 👎.
Issue
Fixes a run-finalization sequencing bug in ServerAppIo by keeping app tokens alive until terminal status is successfully persisted.
Prior to this fix, PushAppOutputs deletes the token before the final UpdateRunStatus(FINISHED, ...) call. If that terminal status update fails during shutdown, the token is already gone, and heartbeat-expiry fallback can not finalize the run.
Proposal
Changes:
Updated serverappio_servicer_test.py with focused regressions:
Checklist
#contributions)