-
Notifications
You must be signed in to change notification settings - Fork 7
Global min reduction #966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Global min reduction #966
Conversation
set `arch = None` default im `make_field_descriptor`
Fix support for AMD legacy
Unpin scikit-build-core
Fix RPATH for python extension module
Add bump-my-version
PyPI deploy action
Fix typo in workflow_dispatch
rpaths for bundled oomph libs
…s.py Co-authored-by: Magdalena <luzm@ethz.ch>
…s.py Co-authored-by: Magdalena <luzm@ethz.ch>
…osition.py Co-authored-by: Magdalena <luzm@ethz.ch>
…osition.py Co-authored-by: Magdalena <luzm@ethz.ch>
…osition.py Co-authored-by: Magdalena <luzm@ethz.ch>
halungge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add in a similar way a max and mean to Reductions ? The mean we will need to compute mean_cell_area and edge_lengths in the geometry and a max is needed at least somewhere in new driver statistics
halungge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For its good so far except for the GPU sync question, where I don't know whether that is mandatory.
| def min(self, buffer: data_alloc.NDArray, array_ns: ModuleType = np) -> state_utils.ScalarType: | ||
| local_min = array_ns.min(buffer) | ||
| recv_buffer = array_ns.empty(1, dtype=buffer.dtype) | ||
| self._props.comm.Allreduce(local_min, recv_buffer, mpi4py.MPI.MIN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably consider the stream synchronisation issue here for cupy arrays. See the example in the tutorial: https://mpi4py.readthedocs.io/en/stable/tutorial.html#gpu-aware-mpi-python-gpu-arrays
I did not test it on GPU, did you? I would at least leave an TODO here that this might become a problem. Also ask the HPC - MPI pros like @msimberg ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add the sync with a link to the tutorial. Probably something like
if hasattr(array_ns, "cuda"):
array_ns.cuda.runtime.deviceSynchronize()
|
cscs-ci run default |
|
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests To run benchmarks you can use:
To run tests and benchmarks with the DaCe backend you can use:
To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:
For more detailed information please look at CI in the EXCLAIM universe. |
Addition of
global_minreduction fornflat_gradp: