Scheduled Halo Exchange #980

philip-paul-mueller · 2025-12-18T12:06:05Z

This PR introduces the "scheduled" exchange feature from GHEX into ICON4Py.

Scheduled exchanges allows to call the exchange function (the function responsible for packing the data and sending it) before the work has concluded.
The packing will only start when all work that was previously submitted to a stream has been finished.
In addition the scheduled wait function (the function responsible for unpacking the data) does not wait until all data has been unpacked, instead it will start it and then synchronize with the provided stream.
The default behaviour is to use scheduled exchange function and synchronize with the default stream.

The function extends ICON4Py's decomposition concepts and adds the keyword only argument stream to the exchange() and wait() functions.
This is the stream with which the exchange/wait should synchronize with.
To deactivate the feature, i.e. send immediately and wait until unpacking has been done, one can pass None.

The PR introduces the CupyLikeStream and CudaStreamProtocolLike protocols, that allows to extract a C GPU stream from a Python object.
In addition the DefaultStream singleton is also added, which indicates to use the default stream.
The reason for this is to avoid to import CuPy/CUDA directly such that ICON4Py also works in a pure CPU mode.

Note:
If only CPU memory is exchanged then the behavior is the same as before, i.e. exchange() starts sending immediately and wait() only returns after everything has been unpacked.
If CPU and GPU memory is exchanged the behaviour is kind of the "intersection" between the two.
It is fine, but generally not recommended.

DONE:

Updating the interface.
Applied Hannes' suggestions (or at least my interpretations of them), however, I think we can improve on the current state.

TODO:

Integrating the stream into diffusion and dycore.

philip-paul-mueller · 2025-12-18T12:06:36Z

cscs-ci run default

philip-paul-mueller · 2025-12-18T12:06:40Z

cscs-ci run extra

havogt · 2025-12-18T12:49:58Z

model/common/src/icon4py/model/common/decomposition/definitions.py


 class ExchangeResult(Protocol):
-    def wait(self) -> None: ...
+    def wait(self, stream: Any | type[NoStream] | None = NoStream) -> None:


In case you are interested in early feedback. What does None mean?

Yes I am.
It means default stream inspired from nullptr in C++.

To me it feels more intuitive to do None == wait (i.e. the current NoStream), and then use Stream.null for the default stream, see https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.Stream.html

I agree the None for `default stream is not really a good idea and is more motivated from C++.
I will change that in GHEX and also implement the protocol, thanks for the suggestions.

havogt · 2025-12-18T13:19:26Z

model/common/src/icon4py/model/common/decomposition/definitions.py


 class ExchangeResult(Protocol):
-    def wait(self) -> None: ...
+    def wait(self, stream: Any | type[NoStream] | None = NoStream) -> None:


i.e.

Suggested change

def wait(self, stream: Any | type[NoStream] | None = NoStream) -> None:

def wait(self, stream: cuda.Stream | None = cuda.Stream.null) -> None:

And we have to figure out the right abstractions for non-cuda case...

I used Any more as a "please help me", so my question may now sound naïve, but with cuda.Stream you mean CuPy?
So should I import cupy directly or should I create a protocol somewhere?

@havogt
After some thinking I would do the following:

Changing the meaning of None to "do not use the scheduling implementation; I agree this is the better solution.

`create two protocols for the stream one following the Nvidia protocol and one following the CuPy for getting the cuda stream.

Creating a singelton object to represent use the "default stream", i.e. turning NoStream into DefaultStream.

The good thing is that nothing GPU related needs to be present, so no strange import errors.
Furthermore, the user can use both CUDA streams and CuPy streams.
What do you think?

I have now implemented my "interpretation of your suggestions".
However, for the default stream I do not use cuda.Stream.null because this would mean that we have to import CuPy/Cuda all the time, which I do not like.
But maybe I am missing something.

philip-paul-mueller · 2025-12-18T14:45:26Z

cscs-ci run default

philip-paul-mueller · 2025-12-18T14:45:28Z

cscs-ci run dace

philip-paul-mueller · 2025-12-18T14:45:32Z

cscs-ci run extra

**NOTE:** This commit still follows the old nomoclature, where `None` means default stream. Most likely this will change such that `None` means "not using `schedule_*()` functions and another sigelton is used for it.

philip-paul-mueller · 2025-12-19T05:48:53Z

cscs-ci run default

- There are now two protocols that describes how to extract the underlying address. They are probably at the wrong location. - `stream=None` no longer means "default stream" but is not equivalent to "do not use scheduled version". - To indicate the default stream the singelton `DefaultStream` is used. The `cupy.cuda.Stream.null` singelton was not used, because it would require that `cupy` is present. - However, use the default stream is still the default behaviour.

philip-paul-mueller · 2025-12-19T06:43:22Z

cscs-ci run default

philip-paul-mueller · 2025-12-19T06:43:26Z

cscs-ci run dace

philip-paul-mueller · 2025-12-19T06:43:31Z

cscs-ci run extra

philip-paul-mueller · 2025-12-19T07:50:24Z

cscs-ci run default

philip-paul-mueller · 2025-12-19T07:50:28Z

cscs-ci run dace

philip-paul-mueller · 2025-12-19T07:50:33Z

cscs-ci run extra

philip-paul-mueller · 2025-12-19T13:34:14Z

There is a failing in extra, however, this error is also present on main.

See this test PR: #982

philip-paul-mueller · 2025-12-19T14:10:31Z

cscs-ci run default

philip-paul-mueller · 2025-12-19T14:10:33Z

cscs-ci run dace

philip-paul-mueller · 2025-12-23T11:02:04Z

cscs-ci run default

philip-paul-mueller · 2026-01-01T08:00:26Z

cscs-ci run default

philip-paul-mueller · 2026-01-01T08:00:31Z

cscs-ci run dace

philip-paul-mueller · 2026-01-01T08:00:37Z

cscs-ci run extra

philip-paul-mueller · 2026-01-01T13:14:06Z

cscs-ci run default

philip-paul-mueller · 2026-01-01T13:14:09Z

cscs-ci run dace

philip-paul-mueller · 2026-01-01T13:14:17Z

cscs-ci run extra

philip-paul-mueller · 2026-01-02T11:22:31Z

cscs-ci run default

philip-paul-mueller · 2026-01-02T11:22:34Z

cscs-ci run dace

philip-paul-mueller · 2026-01-02T11:22:38Z

cscs-ci run extra

github-actions · 2026-01-02T12:59:19Z

Mandatory Tests

Please make sure you run these tests via comment before you merge!

cscs-ci run default

Optional Tests

To run benchmarks you can use:

cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

philip-paul-mueller · 2026-01-02T12:59:31Z

cscs-ci run extra

philip-paul-mueller · 2026-01-02T12:59:35Z

cscs-ci run default

philip-paul-mueller · 2026-01-02T12:59:38Z

cscs-ci run dace

philip-paul-mueller · 2026-01-02T16:31:16Z

cscs-ci run default

philip-paul-mueller · 2026-01-02T16:31:28Z

cscs-ci run dace

philip-paul-mueller · 2026-01-02T16:31:32Z

cscs-ci run extra

philip-paul-mueller added 2 commits December 18, 2025 09:26

Modified versions.

26d685b

Made some addaptions towards the asynchronous exchange.

6518ce9

More uniformity.

ec7fca2

havogt reviewed Dec 18, 2025

View reviewed changes

philip-paul-mueller added 2 commits December 18, 2025 15:43

Updated ghex version.

1f5e9e6

Fixed at least that issue.

e69cb82

Made the components aware of async stuff.

f60a1f8

**NOTE:** This commit still follows the old nomoclature, where `None` means default stream. Most likely this will change such that `None` means "not using `schedule_*()` functions and another sigelton is used for it.

philip-paul-mueller added 2 commits December 19, 2025 07:30

Fixed some stray stream argument.

e11da41

philip-paul-mueller added 2 commits December 19, 2025 08:10

Realized that the strams are disabled.

383f959

Let's see if that help, but it is strange that it takes longer now.

41322f7

philip-paul-mueller requested a review from havogt December 19, 2025 08:45

Updated ghex version.

ae6db39

philip-paul-mueller changed the title ~~[DO NOT MERGE]: Scheduled Halo Exchange~~ Scheduled Halo Exchange Dec 19, 2025

philip-paul-mueller added 9 commits December 23, 2025 12:26

Updated GHEX.

e04e057

Forgot one.

7fc5154

Let's hope that is enough.

9a68a54

Added warnings.

0d3a400

Updated GHEX.

6f4c6da

Starnge things are going on.

aae7b06

The problem was me installing GHEX wrong.

d6dbc8c

The no streaming case is no longer None but has a named constant.

c8d58f8

Forgot to add a warning.

89a94aa

I am not fully like it, but for this it seems appropriate.

0b8307b

Philip Mueller added 2 commits January 2, 2026 12:20

Forgot that.

0e2fbde

Merge remote-tracking branch 'gitlab/main' into phimuell__async_mpi_test

0694826

Removed useless return.

a570aee

	def wait(self, stream: Any \| type[NoStream] \| None = NoStream) -> None:
	def wait(self, stream: cuda.Stream \| None = cuda.Stream.null) -> None:

Scheduled Halo Exchange #980

Are you sure you want to change the base?

Scheduled Halo Exchange #980

Conversation

philip-paul-mueller commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philip-paul-mueller commented Dec 18, 2025

Uh oh!

philip-paul-mueller commented Dec 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

havogt Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

philip-paul-mueller commented Dec 18, 2025

Uh oh!

philip-paul-mueller commented Dec 18, 2025

Uh oh!

philip-paul-mueller commented Dec 18, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 23, 2025

Uh oh!

philip-paul-mueller commented Jan 1, 2026

Uh oh!

philip-paul-mueller commented Jan 1, 2026

Uh oh!

philip-paul-mueller commented Jan 1, 2026

Uh oh!

philip-paul-mueller commented Jan 1, 2026

Uh oh!

philip-paul-mueller commented Jan 1, 2026

Uh oh!

philip-paul-mueller commented Jan 1, 2026

Uh oh!

philip-paul-mueller commented Jan 2, 2026

Uh oh!

philip-paul-mueller commented Jan 2, 2026

Uh oh!

philip-paul-mueller commented Jan 2, 2026

Uh oh!

github-actions bot commented Jan 2, 2026

Uh oh!

philip-paul-mueller commented Dec 18, 2025 •

edited

Loading

havogt Dec 18, 2025 •

edited

Loading

philip-paul-mueller commented Dec 19, 2025 •

edited

Loading