Skip to content

Conversation

@gc00
Copy link
Collaborator

@gc00 gc00 commented Mar 27, 2023

This tests the use of MPI_Allreduce_reproducible(). This Allreduce is reproducible (even for the lowest bit), by doing a Gatther to a single rank, and then calling MPI_Reduce_local(tmpbuf, recvbuf, count, datatype, op); to do the op reproducibly on the local process.

@tarunsmalviya, please cherry-pick the top commit, 'allreduce-reproducible', from my allreduce-reproducible branch and test this, to see if the resume and restart branch agree with each other:

git remote add gc00 https://github.com/gc00/mana
git fetch gc00 allreduce-reproducible
git checkout COPY_OF_YOUR_TESTING_BRANCH
git cherry-pick  remotes/gc00/allreduce-reproducible

I'm hoping that the resume and the restart will match each other after this.

@gc00 gc00 added the question Further information is requested label Mar 27, 2023
@gc00 gc00 requested a review from tarunsmalviya March 27, 2023 07:29
@gc00 gc00 force-pushed the allreduce-reproducible branch from 526b576 to bc43191 Compare March 27, 2023 07:37
@gc00
Copy link
Collaborator Author

gc00 commented Sep 13, 2023

It seems like an improved version of this PR was already pushed in as PR #313:

commit 2b32564
Author: Tarun Malviya malviya.t@northeastern.com
Date: Sat Apr 29 10:44:30 2023 -0700
Allreduce reproducible function added.

Review this later, and probably close the PR without committing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant