MPI #141

valeriaRaffuzzi · 2024-11-15T19:07:45Z

Finally! I think this is ready to go.

This PR includes MPI support. The main differences are in the tallies and in the dungeon for population normalisation between cycles and load balancing (from Paul Romano's PhD).

In the tallies, one key change is that a report was added: closeCycle other than reportCycleEnd. With MPI, there are two options: mpiSync 1 means that the tallies are synchronised each cycle; mpiSync 0 means that the tallies from each rank are collected at the end of the calculation. All calculations give identical results when mpiSync 1; they are within statistics when mpiSync 0. NOTE that this option applies to all tallies included in a tally admin rather to individual clerks. Splitting reportCycleEnd into two procedures (i.e., adding closeCycle), makes reproducibility easier for most tallyClerks.

In the dungeon, population normalisation was implemented using a new data structure, heapQueue. Note that to ensure reproducibility, particles have to be sorted before and after sampling. Then, load balancing is performed transferring particles between processes.

Results seem to be reproducible for all tallies, and all the tests pass successfully.

@Mikolaj-A-Kowalski However the github tests seem to fail during compilation because the MPI library is not available. Do you know if there's an easy solution to this? Otherwise I'll have a look.

Determine the maximum value of the key without implicit assumptions.

We were reusing first few random numbers following the source generation (state of RNG was the same at the beginning of sampling the source particle and its first flight). This commit moves the RNG back before the source generation is performed, thus preventing the reuse.

No communication takes place at this stage.

Will be reproducable if fixes to source_init are merged.

Allows to limit the console I/O to the master process only. Applied to fixed source calculation only at the moment.

It will be used for sampling without replacement.

Results from not master processes are not combined, hence they are lost at the moment.

Fixes a bug for synchronised scoreMemory. The buffer value after transfer in parallelBin was not properly set to 0 again.

It is not reproducable at the moment

ChasingNeutrons · 2024-12-05T15:25:16Z

PhysicsPackages/eigenPhysicsPackage_class.f90

+  use mpi_func,                       only : isMPIMaster, getWorkshare, getOffset, getMPIRank
+#ifdef MPI
+  use mpi_func,                       only : MASTER_RANK, MPI_Bcast, MPI_INT, MPI_COMM_WORLD, &
+                                             MPI_DOUBLE, mpi_reduce, MPI_SUM


Suggested change

MPI_DOUBLE, mpi_reduce, MPI_SUM

MPI_DOUBLE, MPI_REDUCE, MPI_SUM

ChasingNeutrons · 2024-12-05T15:27:21Z

PhysicsPackages/eigenPhysicsPackage_class.f90


+#ifdef MPI
+      ! Print the population numbers referred to all processes to screen
+      call mpi_reduce(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)


Suggested change

call mpi_reduce(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)

call MPI_REDUCE(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)

SharedModules/errors_mod.f90

SharedModules/mpi_func.f90

docs/Running.rst

ChasingNeutrons · 2024-12-05T17:52:17Z

To get the tests passing, I think it's a case of including MPI installation in the docker image? mikolajkowalski/scone-test

valeriaRaffuzzi · 2024-12-17T14:57:35Z

After a long time spent fighting with the compiler..... he is still winning. The problem is related to the Docker container though, rather than to SCONE. OMPI doesn't like it when we try to run the tests as root-user in the container (this refers to the unitTests). I managed to force it with gfortran9 and gfortran10 (which, as you can see, run ok). However I couldn't find a way to get gfortran8 to agree to run!
The latest is that I commented out gfortran8 to make sure that everything else was working correctly, which it is. If we want to reinstate gfortran8, I welcome ideas!

valeriaRaffuzzi · 2024-12-17T15:40:35Z

Another thing to notice is that in the particleDungeon, the previous sampling w/o replacement algorithm is in a new procedure called samplingWithoutReplacement, and it isn't used.
I didn't want to delete it completely because it's nice code, and it might be useful in other contexts in the future? If you both agree I can however delete it since now it is redundant.

Mikolaj-A-Kowalski · 2024-12-17T18:18:42Z

After a long time spent fighting with the compiler..... he is still winning. The problem is related to the Docker container though, rather than to SCONE. OMPI doesn't like it when we try to run the tests as root-user in the container (this refers to the unitTests). I managed to force it with gfortran9 and gfortran10 (which, as you can see, run ok). However I couldn't find a way to get gfortran8 to agree to run! The latest is that I commented out gfortran8 to make sure that everything else was working correctly, which it is. If we want to reinstate gfortran8, I welcome ideas!

The reason here is most probably that the gfotran-8 image is based on the older Debian (buster), which has version of OpenMPI (3.1) before they added the environmental variables option in 4.0.

In general we should probably push up the compiler versions for CI and maybe drop gfortran-8 :-/
I would prefer it done in a separate PR for clarity. And in the same PR the newer compilers should be added.

If we want to keep gfotran-8 we could just try MPICH?

Mikolaj-A-Kowalski · 2024-12-17T18:19:41Z

Another thing to notice is that in the particleDungeon, the previous sampling w/o replacement algorithm is in a new procedure called samplingWithoutReplacement, and it isn't used. I didn't want to delete it completely because it's nice code, and it might be useful in other contexts in the future? If you both agree I can however delete it since now it is redundant.

No reason to keep dead code. It will be preserved in the Git history if anyone ever wants to inspect it.

valeriaRaffuzzi · 2024-12-17T18:26:06Z

The reason here is most probably that the gfotran-8 image is based on the older Debian (buster), which has version of OpenMPI (3.1) before they added the environmental variables option in 4.0.

In general we should probably push up the compiler versions for CI and maybe drop gfortran-8 :-/ I would prefer it done in a separate PR for clarity. And in the same PR the newer compilers should be added.

If we want to keep gfotran-8 we could just try MPICH?

Yes that makes sense, about the older OMPI version (annoying!). I agree about dropping gfortran 8 and adding newer versions (all this in a separate PR). But in this case, the problem remains that I can't get the tests to pass in this PR.... I will surely manage with more work but I wonder if it's worth spending time on this.
Could try MPICH, but just to then drop it in the next PR?

Mikolaj-A-Kowalski · 2024-12-17T19:02:01Z

The reason here is most probably that the gfotran-8 image is based on the older Debian (buster), which has version of OpenMPI (3.1) before they added the environmental variables option in 4.0.
In general we should probably push up the compiler versions for CI and maybe drop gfortran-8 :-/ I would prefer it done in a separate PR for clarity. And in the same PR the newer compilers should be added.
If we want to keep gfotran-8 we could just try MPICH?

Yes that makes sense, about the older OMPI version (annoying!). I agree about dropping gfortran 8 and adding newer versions (all this in a separate PR). But in this case, the problem remains that I can't get the tests to pass in this PR.... I will surely manage with more work but I wonder if it's worth spending time on this. Could try MPICH, but just to then drop it in the next PR?

We can just make the new PR quick (yes... I know) and then rebase this one on the main with new compilers when it is merged. This one can be left without gfotran-8 for now.

ChasingNeutrons

Only a few very small things then I'm happy.

ParticleObjects/Tests/particleDungeon_test.f90

ChasingNeutrons · 2025-10-28T14:02:48Z

ParticleObjects/particleDungeon_class.f90

+  !!
+  !! Perform nearest neighbor load balancing
+  !!
+#ifdef MPI


Wouldn't it cut out lots of ifdefs if you only put the ifdef MPI inside loadbalancing rather than wrapping it around each time it occurs?

You mean at runtime? Not sure what you mean here. It could be that there are better ways to put those statements, but I am not sure it makes a big difference at all.. I am tempted to leave it as is for now!

I mean that the contents of the function (somewhere after the 'popSizes' definition up to right before 'end subroutine') could be surrounded by an ifdef, but the function itself needn't be. This would allow you to remove ifdefs around when it is called and its definition as a procedure. I think this is desirable because (understandably) there are ifdefs everywhere, so it would be nice for readability to be able to remove a few.

I changed it! I see arguments to do it either way but I don't have a strong preference.

Tallies/tallyAdmin_class.f90

Mikolaj-A-Kowalski and others added 26 commits September 30, 2024 12:11

Add id to track the source particle index

f6d93af

Initial implementation of the dungeon sort

b1f12f8

Add range checks to the sorting of particleDungeon

04b37e1

Determine the maximum value of the key without implicit assumptions.

Rename showerID to broodID

1ad1ab8

Apply PR comments

39610a1

Add MPI to the compilation

c328a21

No communication takes place at this stage.

Explicitly reduce bins before they are closed

fa1702d

Add MPI reduction of score bins

3f1a34f

Initial MPI support in fixed source calculation

911c2b9

Will be reproducable if fixes to source_init are merged.

Add display_func module

d21f4ab

Allows to limit the console I/O to the master process only. Applied to fixed source calculation only at the moment.

Add heap queue data structure

4acb140

It will be used for sampling without replacement.

Remove circular dependencies

6ff2f01

Allow to get the current state of the RNG

0392e76

Fix reproducibility of MPI FS simulation

ee97c70

Fix MPI free compilation

adc1e62

Make the MPI sync optional for scoreMemory

a233e0c

Results from not master processes are not combined, hence they are lost at the moment.

Add default normalisation for results

b3bd9c1

Fixes a bug for synchronised scoreMemory. The buffer value after transfer in parallelBin was not properly set to 0 again.

Add option to collect distrubuted tally results

12cbdb6

Use statusMsg for display output of tallyClerks

35ef86d

Make eigenPP work in MPI calculations

e02b5d0

It is not reproducable at the moment

Adding MPI to work with eigenPP

c03350c

Adjust tallies to be compatible with MPI

d5d1853

Fix shannon entropy clerk with MPI

b4fc455

Update MPI branch to main latest and allow tests to run with MPI

2b05044

Making tallies reproducible with mpi synchronisation

6a1eb86

valeriaRaffuzzi requested review from ChasingNeutrons and Mikolaj-A-Kowalski November 15, 2024 19:07

ChasingNeutrons reviewed Dec 5, 2024

View reviewed changes

SharedModules/errors_mod.f90 Outdated Show resolved Hide resolved

ChasingNeutrons reviewed Dec 5, 2024

View reviewed changes

SharedModules/mpi_func.f90 Outdated Show resolved Hide resolved

ChasingNeutrons reviewed Dec 5, 2024

View reviewed changes

SharedModules/mpi_func.f90 Show resolved Hide resolved

ChasingNeutrons reviewed Dec 5, 2024

View reviewed changes

docs/Running.rst Show resolved Hide resolved

valeriaRaffuzzi force-pushed the mpi branch 2 times, most recently from a978577 to 3fb146e Compare December 17, 2024 14:49

valeriaRaffuzzi marked this pull request as draft December 23, 2024 14:03

Addressing reviews and fixing github tests

7b36bcc

valeriaRaffuzzi force-pushed the mpi branch from 35fff13 to 7b36bcc Compare December 23, 2024 15:06

valeriaRaffuzzi marked this pull request as ready for review December 23, 2024 15:07

valeriaRaffuzzi and others added 6 commits December 23, 2024 16:08

Merge branch 'main' into mpi

74af031

Fix bug in building thermalScatterElastic_class.f90

a42e27b

Merge branch 'CambridgeNuclear:main' into main

58c33c9

Introduce choice between reproducible particle normalisation or not

1aadf39

Fixing conflicts

3253ed0

Fix typo in dungeon test

b031527

ChasingNeutrons approved these changes Oct 28, 2025

View reviewed changes

V. Raffuzzi added 4 commits December 17, 2025 18:21

Sort conflicts

fc11950

Solve conflicts and address reviews

cea8f2e

change CI workflow

85dc0b7

typo

c9299f5

valeriaRaffuzzi merged commit fdf1085 into CambridgeNuclear:main Dec 18, 2025
6 checks passed

valeriaRaffuzzi deleted the mpi branch December 18, 2025 12:55

	MPI_DOUBLE, mpi_reduce, MPI_SUM
	MPI_DOUBLE, MPI_REDUCE, MPI_SUM

	call mpi_reduce(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)
	call MPI_REDUCE(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)

MPI #141

MPI #141

Uh oh!

Conversation

valeriaRaffuzzi commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChasingNeutrons Dec 5, 2024

Choose a reason for hiding this comment

Uh oh!

ChasingNeutrons Dec 5, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChasingNeutrons commented Dec 5, 2024

Uh oh!

valeriaRaffuzzi commented Dec 17, 2024

Uh oh!

valeriaRaffuzzi commented Dec 17, 2024

Uh oh!

Mikolaj-A-Kowalski commented Dec 17, 2024

Uh oh!

Mikolaj-A-Kowalski commented Dec 17, 2024

Uh oh!

valeriaRaffuzzi commented Dec 17, 2024

Uh oh!

Mikolaj-A-Kowalski commented Dec 17, 2024

Uh oh!

ChasingNeutrons left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChasingNeutrons Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

valeriaRaffuzzi Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

ChasingNeutrons Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

valeriaRaffuzzi Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

valeriaRaffuzzi commented Nov 15, 2024 •

edited

Loading