Skip to content

Conversation

@philip-paul-mueller
Copy link
Collaborator

@philip-paul-mueller philip-paul-mueller commented Dec 18, 2025

This PR introduces the "scheduled" exchange feature from GHEX into ICON4Py.

Scheduled exchanges allows to call the exchange function (the function responsible for packing the data and sending it) before the work has concluded.
The packing will only start when all work that was previously submitted to a stream has been finished.
In addition the scheduled wait function (the function responsible for unpacking the data) does not wait until all data has been unpacked, instead it will start it and then synchronize with the provided stream.
The default behaviour is to use scheduled exchange function and synchronize with the default stream.

The function extends ICON4Py's decomposition concepts and adds the keyword only argument stream to the exchange() and wait() functions.
This is the stream with which the exchange/wait should synchronize with.
To deactivate the feature, i.e. send immediately and wait until unpacking has been done, one can pass None.

The PR introduces the CupyLikeStream and CudaStreamProtocolLike protocols, that allows to extract a C GPU stream from a Python object.
In addition the DefaultStream singleton is also added, which indicates to use the default stream.
The reason for this is to avoid to import CuPy/CUDA directly such that ICON4Py also works in a pure CPU mode.

Note:
If only CPU memory is exchanged then the behavior is the same as before, i.e. exchange() starts sending immediately and wait() only returns after everything has been unpacked.
If CPU and GPU memory is exchanged the behaviour is kind of the "intersection" between the two.
It is fine, but generally not recommended.

DONE:

  • Updating the interface.
  • Applied Hannes' suggestions (or at least my interpretations of them), however, I think we can improve on the current state.

TODO:

  • Integrating the stream into diffusion and dycore.

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra


class ExchangeResult(Protocol):
def wait(self) -> None: ...
def wait(self, stream: Any | type[NoStream] | None = NoStream) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you are interested in early feedback. What does None mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am.
It means default stream inspired from nullptr in C++.

Copy link
Contributor

@havogt havogt Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it feels more intuitive to do None == wait (i.e. the current NoStream), and then use Stream.null for the default stream, see https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.Stream.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the None for `default stream is not really a good idea and is more motivated from C++.
I will change that in GHEX and also implement the protocol, thanks for the suggestions.


class ExchangeResult(Protocol):
def wait(self) -> None: ...
def wait(self, stream: Any | type[NoStream] | None = NoStream) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e.

Suggested change
def wait(self, stream: Any | type[NoStream] | None = NoStream) -> None:
def wait(self, stream: cuda.Stream | None = cuda.Stream.null) -> None:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we have to figure out the right abstractions for non-cuda case...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used Any more as a "please help me", so my question may now sound naïve, but with cuda.Stream you mean CuPy?
So should I import cupy directly or should I create a protocol somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@havogt
After some thinking I would do the following:

  • Changing the meaning of None to "do not use the scheduling implementation; I agree this is the better solution.
  • `create two protocols for the stream one following the Nvidia protocol and one following the CuPy for getting the cuda stream.
  • Creating a singelton object to represent use the "default stream", i.e. turning NoStream into DefaultStream.

The good thing is that nothing GPU related needs to be present, so no strange import errors.
Furthermore, the user can use both CUDA streams and CuPy streams.
What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now implemented my "interpretation of your suggestions".
However, for the default stream I do not use cuda.Stream.null because this would mean that we have to import CuPy/Cuda all the time, which I do not like.
But maybe I am missing something.

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

**NOTE:**
This commit still follows the old nomoclature, where `None` means default stream.
Most likely this will change such that `None` means "not using `schedule_*()` functions and another sigelton is used for it.
@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

- There are now two protocols that describes how to extract the underlying address.
	They are probably at the wrong location.
- `stream=None` no longer means "default stream" but is not equivalent to "do not use scheduled version".
- To indicate the default stream the singelton `DefaultStream` is used.
	The `cupy.cuda.Stream.null` singelton was not used, because it would require that `cupy` is present.
- However, use the default stream is still the default behaviour.
@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

philip-paul-mueller commented Dec 19, 2025

There is a failing in extra, however, this error is also present on main.

See this test PR: #982

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller philip-paul-mueller changed the title [DO NOT MERGE]: Scheduled Halo Exchange Scheduled Halo Exchange Dec 19, 2025
@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@github-actions
Copy link

github-actions bot commented Jan 2, 2026

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants