-
Notifications
You must be signed in to change notification settings - Fork 512
UCP/EP: fix discarding from pending on failed lane #6933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/EP: fix discarding from pending on failed lane #6933
Conversation
844990b to
4e65d55
Compare
| if (ucp_worker_is_uct_ep_discarding(worker, uct_ep)) { | ||
| ucs_debug("UCT EP %p is being discarded on UCP Worker %p", | ||
| uct_ep, worker); | ||
| uct_ep_pending_purge(uct_ep, ucp_ep_err_pending_purge, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why needed?
UCT EP should be purged as a part of discarding procedure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discard itself can be on pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discard itself can be on pending
but why we need to remove discarding from the pending?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be removed by discarding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does not remove, ucp_ep_err_pending_purge does ucp_request_send_state_ff which posts flush cancel again, that's the fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add comment to describe why it's needed
| if (ucp_worker_is_uct_ep_discarding(worker, uct_ep)) { | ||
| ucs_debug("UCT EP %p is being discarded on UCP Worker %p", | ||
| uct_ep, worker); | ||
| uct_ep_pending_purge(uct_ep, ucp_ep_err_pending_purge, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add comment to describe why it's needed
src/ucp/core/ucp_worker.c
Outdated
| * UCS_ERR_NO_RESOURCES, so need to purge the queue to resubmit the | ||
| * operation */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's "abort the operation", not resubmit, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucp_request_send_state_ff, in case of discard, has to re-submit the operation to avoid reordering, flush cancel must be completed last otherwise we can get error WQE when lanes are destroyed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add to comment:
/* We need to resubmit the FLUSH_CANCEL operation on the same failed lane, in order to make sure all previous outstanding operations are completed before destroying the failed endpoint */
+ reduce TL timeouts and related refactoring
8ba56db to
8bfb149
Compare
|
port |
What
fix discarding from pending on failed lane
Why ?
bugfix
How ?