Skip to content

Conversation

@valeriaRaffuzzi
Copy link
Member

@valeriaRaffuzzi valeriaRaffuzzi commented Nov 15, 2024

Finally! I think this is ready to go.

This PR includes MPI support. The main differences are in the tallies and in the dungeon for population normalisation between cycles and load balancing (from Paul Romano's PhD).

In the tallies, one key change is that a report was added: closeCycle other than reportCycleEnd. With MPI, there are two options: mpiSync 1 means that the tallies are synchronised each cycle; mpiSync 0 means that the tallies from each rank are collected at the end of the calculation. All calculations give identical results when mpiSync 1; they are within statistics when mpiSync 0. NOTE that this option applies to all tallies included in a tally admin rather to individual clerks. Splitting reportCycleEnd into two procedures (i.e., adding closeCycle), makes reproducibility easier for most tallyClerks.

In the dungeon, population normalisation was implemented using a new data structure, heapQueue. Note that to ensure reproducibility, particles have to be sorted before and after sampling. Then, load balancing is performed transferring particles between processes.

Results seem to be reproducible for all tallies, and all the tests pass successfully.

@Mikolaj-A-Kowalski However the github tests seem to fail during compilation because the MPI library is not available. Do you know if there's an easy solution to this? Otherwise I'll have a look.

Mikolaj-A-Kowalski and others added 26 commits September 30, 2024 12:11
Determine the maximum value of the key without implicit assumptions.
We were reusing first few random numbers following the source generation
(state of RNG was the same at the beginning of sampling the source
particle and its first flight). This commit moves the RNG back before
the source generation is performed, thus preventing the reuse.
No communication takes place at this stage.
Will be reproducable if fixes to source_init are merged.
Allows to limit the console I/O to the master process only. Applied
to fixed source calculation only at the moment.
It will be used for sampling without replacement.
Results from not master processes are not combined, hence they are lost
at the moment.
Fixes a bug for synchronised scoreMemory. The buffer value after
transfer in parallelBin was not properly set to 0 again.
It is not reproducable at the moment
use mpi_func, only : isMPIMaster, getWorkshare, getOffset, getMPIRank
#ifdef MPI
use mpi_func, only : MASTER_RANK, MPI_Bcast, MPI_INT, MPI_COMM_WORLD, &
MPI_DOUBLE, mpi_reduce, MPI_SUM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MPI_DOUBLE, mpi_reduce, MPI_SUM
MPI_DOUBLE, MPI_REDUCE, MPI_SUM


#ifdef MPI
! Print the population numbers referred to all processes to screen
call mpi_reduce(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
call mpi_reduce(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)
call MPI_REDUCE(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)

@ChasingNeutrons
Copy link
Collaborator

To get the tests passing, I think it's a case of including MPI installation in the docker image? mikolajkowalski/scone-test

@valeriaRaffuzzi valeriaRaffuzzi force-pushed the mpi branch 2 times, most recently from a978577 to 3fb146e Compare December 17, 2024 14:49
@valeriaRaffuzzi
Copy link
Member Author

After a long time spent fighting with the compiler..... he is still winning. The problem is related to the Docker container though, rather than to SCONE. OMPI doesn't like it when we try to run the tests as root-user in the container (this refers to the unitTests). I managed to force it with gfortran9 and gfortran10 (which, as you can see, run ok). However I couldn't find a way to get gfortran8 to agree to run!
The latest is that I commented out gfortran8 to make sure that everything else was working correctly, which it is. If we want to reinstate gfortran8, I welcome ideas!

@valeriaRaffuzzi
Copy link
Member Author

Another thing to notice is that in the particleDungeon, the previous sampling w/o replacement algorithm is in a new procedure called samplingWithoutReplacement, and it isn't used.
I didn't want to delete it completely because it's nice code, and it might be useful in other contexts in the future? If you both agree I can however delete it since now it is redundant.

@Mikolaj-A-Kowalski
Copy link
Collaborator

After a long time spent fighting with the compiler..... he is still winning. The problem is related to the Docker container though, rather than to SCONE. OMPI doesn't like it when we try to run the tests as root-user in the container (this refers to the unitTests). I managed to force it with gfortran9 and gfortran10 (which, as you can see, run ok). However I couldn't find a way to get gfortran8 to agree to run! The latest is that I commented out gfortran8 to make sure that everything else was working correctly, which it is. If we want to reinstate gfortran8, I welcome ideas!

The reason here is most probably that the gfotran-8 image is based on the older Debian (buster), which has version of OpenMPI (3.1) before they added the environmental variables option in 4.0.

In general we should probably push up the compiler versions for CI and maybe drop gfortran-8 :-/
I would prefer it done in a separate PR for clarity. And in the same PR the newer compilers should be added.

If we want to keep gfotran-8 we could just try MPICH?

@Mikolaj-A-Kowalski
Copy link
Collaborator

Another thing to notice is that in the particleDungeon, the previous sampling w/o replacement algorithm is in a new procedure called samplingWithoutReplacement, and it isn't used. I didn't want to delete it completely because it's nice code, and it might be useful in other contexts in the future? If you both agree I can however delete it since now it is redundant.

No reason to keep dead code. It will be preserved in the Git history if anyone ever wants to inspect it.

@valeriaRaffuzzi
Copy link
Member Author

The reason here is most probably that the gfotran-8 image is based on the older Debian (buster), which has version of OpenMPI (3.1) before they added the environmental variables option in 4.0.

In general we should probably push up the compiler versions for CI and maybe drop gfortran-8 :-/ I would prefer it done in a separate PR for clarity. And in the same PR the newer compilers should be added.

If we want to keep gfotran-8 we could just try MPICH?

Yes that makes sense, about the older OMPI version (annoying!). I agree about dropping gfortran 8 and adding newer versions (all this in a separate PR). But in this case, the problem remains that I can't get the tests to pass in this PR.... I will surely manage with more work but I wonder if it's worth spending time on this.
Could try MPICH, but just to then drop it in the next PR?

@Mikolaj-A-Kowalski
Copy link
Collaborator

The reason here is most probably that the gfotran-8 image is based on the older Debian (buster), which has version of OpenMPI (3.1) before they added the environmental variables option in 4.0.
In general we should probably push up the compiler versions for CI and maybe drop gfortran-8 :-/ I would prefer it done in a separate PR for clarity. And in the same PR the newer compilers should be added.
If we want to keep gfotran-8 we could just try MPICH?

Yes that makes sense, about the older OMPI version (annoying!). I agree about dropping gfortran 8 and adding newer versions (all this in a separate PR). But in this case, the problem remains that I can't get the tests to pass in this PR.... I will surely manage with more work but I wonder if it's worth spending time on this. Could try MPICH, but just to then drop it in the next PR?

We can just make the new PR quick (yes... I know) and then rebase this one on the main with new compilers when it is merged. This one can be left without gfotran-8 for now.

@valeriaRaffuzzi valeriaRaffuzzi marked this pull request as draft December 23, 2024 14:03
@valeriaRaffuzzi valeriaRaffuzzi marked this pull request as ready for review December 23, 2024 15:07
Copy link
Collaborator

@ChasingNeutrons ChasingNeutrons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few very small things then I'm happy.

!!
!! Perform nearest neighbor load balancing
!!
#ifdef MPI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it cut out lots of ifdefs if you only put the ifdef MPI inside loadbalancing rather than wrapping it around each time it occurs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean at runtime? Not sure what you mean here. It could be that there are better ways to put those statements, but I am not sure it makes a big difference at all.. I am tempted to leave it as is for now!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that the contents of the function (somewhere after the 'popSizes' definition up to right before 'end subroutine') could be surrounded by an ifdef, but the function itself needn't be. This would allow you to remove ifdefs around when it is called and its definition as a procedure. I think this is desirable because (understandably) there are ifdefs everywhere, so it would be nice for readability to be able to remove a few.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it! I see arguments to do it either way but I don't have a strong preference.

@valeriaRaffuzzi valeriaRaffuzzi merged commit fdf1085 into CambridgeNuclear:main Dec 18, 2025
6 checks passed
@valeriaRaffuzzi valeriaRaffuzzi deleted the mpi branch December 18, 2025 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants