Skip to content

Conversation

@nfarabullini
Copy link
Contributor

@nfarabullini nfarabullini commented Dec 2, 2025

Addition of global_min reduction for nflat_gradp:

    def global_min(self, buffer: data_alloc.NDArray) -> state_utils.ScalarType:
        min_buffer_arr = self._xp.array([self._xp.min(buffer)])
        recv_buffer = self._xp.empty(1, dtype=buffer.dtype)
        self.props.comm.Allreduce(min_buffer_arr, recv_buffer, mpi4py.MPI.MIN)
        return recv_buffer.item()

havogt and others added 30 commits December 13, 2024 15:44
set `arch = None` default im `make_field_descriptor`
Fix RPATH for python extension module
Copy link
Contributor

@halungge halungge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add in a similar way a max and mean to Reductions ? The mean we will need to compute mean_cell_area and edge_lengths in the geometry and a max is needed at least somewhere in new driver statistics

@nfarabullini nfarabullini changed the base branch from parallel_impl to main December 15, 2025 08:37
Copy link
Contributor

@halungge halungge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For its good so far except for the GPU sync question, where I don't know whether that is mandatory.

def min(self, buffer: data_alloc.NDArray, array_ns: ModuleType = np) -> state_utils.ScalarType:
local_min = array_ns.min(buffer)
recv_buffer = array_ns.empty(1, dtype=buffer.dtype)
self._props.comm.Allreduce(local_min, recv_buffer, mpi4py.MPI.MIN)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably consider the stream synchronisation issue here for cupy arrays. See the example in the tutorial: https://mpi4py.readthedocs.io/en/stable/tutorial.html#gpu-aware-mpi-python-gpu-arrays
I did not test it on GPU, did you? I would at least leave an TODO here that this might become a problem. Also ask the HPC - MPI pros like @msimberg ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add the sync with a link to the tutorial. Probably something like

        if hasattr(array_ns, "cuda"):
            array_ns.cuda.runtime.deviceSynchronize()

@nfarabullini
Copy link
Contributor Author

cscs-ci run default

@github-actions
Copy link

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants