Skip to content

Conversation

@philip-paul-mueller
Copy link
Contributor

@philip-paul-mueller philip-paul-mueller commented Jun 10, 2025

If the DaCe GPU code generator encounters a Memlet that can not be expressed as a cudaMemcpy*() call, then it converts it to a Map.
However, the issue is that these Maps have the wrong iteration order, i.e. wrong memory access pattern and it might not even launch, because of too many blocks in the y direction of the compute grid.
For this reason GT4Py has to handle these Memlets explicitly.

However, DaCe PR#1976 changed this slightly and thus GT4Py had to follow.
Note, that some of these changes were already introduced by GT4Py PR#2004, however, they were made for the original version of the DaCe PR (and the GT4Py PR had to be merged before the DaCe PR was merged).

Furthermore, this PR fixes a different issue, also related to the expansion of Memlets, which can be found in DaCe PR#2033 (it is not yet merged and currently at commit 19b6bba).
It fixes a bug in how the Memlets are expanded.
The DaCe PR also adds the possibility to generate an error instead of slightly converting Memlets into Maps and this PR enables this feature.

philip-paul-mueller and others added 7 commits June 5, 2025 09:54
There is one PR missing, that is not yet opened in DaCe.
However, that PR is not needed by GT4Py, thus we can technically merge it.
Now the generation of Maps in the code generator in GPU mode has become a hard error.
This makes it imperative that we only merge the PR once we have CI back and have made sure that all tests, also the one in ICON4Py have passed!
philip-paul-mueller added a commit that referenced this pull request Jun 10, 2025
Updates the DaCe dependency.

This updates does not contains the updates to the GPU codegen.
They are handled in a separate [PR#2070](#2070).
Copy link
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggest, I think we should set allow_implicit_memlet_to_map=False in gtx.program_processors.runners.dace.workflow.common.set_dace_config()

@philip-paul-mueller
Copy link
Contributor Author

Thanks for the hint, I forgot to ask you where you set these values.

@edopao
Copy link
Contributor

edopao commented Jun 11, 2025

Thanks for the hint, I forgot to ask you where you set these values.

It is actually the place you suggested during review of my PR 😄

@philip-paul-mueller philip-paul-mueller merged commit 8bcfab8 into GridTools:main Jun 11, 2025
30 of 31 checks passed
stubbiali pushed a commit to stubbiali/gt4py that referenced this pull request Aug 19, 2025
Updates the DaCe dependency.

This updates does not contains the updates to the GPU codegen.
They are handled in a separate [PR#2070](GridTools#2070).
stubbiali pushed a commit to stubbiali/gt4py that referenced this pull request Aug 19, 2025
…#2070)

If the DaCe GPU code generator encounters a Memlet that can not be
expressed as a `cudaMemcpy*()` call, then it converts it to a Map.
However, the issue is that these Maps have the wrong iteration order,
i.e. wrong memory access pattern and it might not even launch, because
of too many blocks in the `y` direction of the compute grid.
For this reason GT4Py has to handle these Memlets explicitly.

However, [DaCe PR#1976](spcl/dace#1976) changed
this slightly and thus GT4Py had to follow.
Note, that some of these changes were already introduced by [GT4Py
PR#2004](GridTools#2004), however, they
were made for the original version of the DaCe PR (and the GT4Py PR had
to be merged before the DaCe PR was merged).

Furthermore, this PR fixes a different issue, also related to the
expansion of Memlets, which can be found in [DaCe
PR#2033](spcl/dace#2033) (it is not yet merged
and currently at commit `19b6bba`).
It fixes a bug in how the Memlets are expanded.
The DaCe PR also adds the possibility to generate an error instead of
slightly converting Memlets into Maps and this PR enables this feature.

---------

Co-authored-by: edopao <edoardo.paone@cscs.ch>
@philip-paul-mueller philip-paul-mueller deleted the fixed_gpu_code_gen branch October 9, 2025 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants