-
Notifications
You must be signed in to change notification settings - Fork 54
fix[dace][next]: Updated Handling of Not Supported Memlets #2070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix[dace][next]: Updated Handling of Not Supported Memlets #2070
Conversation
There is one PR missing, that is not yet opened in DaCe. However, that PR is not needed by GT4Py, thus we can technically merge it.
Now the generation of Maps in the code generator in GPU mode has become a hard error. This makes it imperative that we only merge the PR once we have CI back and have made sure that all tests, also the one in ICON4Py have passed!
…lemented properly inside DaCe.
Actually resynced with DaCe what is supported and what not.
Updates the DaCe dependency. This updates does not contains the updates to the GPU codegen. They are handled in a separate [PR#2070](#2070).
edopao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you suggest, I think we should set allow_implicit_memlet_to_map=False in gtx.program_processors.runners.dace.workflow.common.set_dace_config()
|
Thanks for the hint, I forgot to ask you where you set these values. |
It is actually the place you suggested during review of my PR 😄 |
Updates the DaCe dependency. This updates does not contains the updates to the GPU codegen. They are handled in a separate [PR#2070](GridTools#2070).
…#2070) If the DaCe GPU code generator encounters a Memlet that can not be expressed as a `cudaMemcpy*()` call, then it converts it to a Map. However, the issue is that these Maps have the wrong iteration order, i.e. wrong memory access pattern and it might not even launch, because of too many blocks in the `y` direction of the compute grid. For this reason GT4Py has to handle these Memlets explicitly. However, [DaCe PR#1976](spcl/dace#1976) changed this slightly and thus GT4Py had to follow. Note, that some of these changes were already introduced by [GT4Py PR#2004](GridTools#2004), however, they were made for the original version of the DaCe PR (and the GT4Py PR had to be merged before the DaCe PR was merged). Furthermore, this PR fixes a different issue, also related to the expansion of Memlets, which can be found in [DaCe PR#2033](spcl/dace#2033) (it is not yet merged and currently at commit `19b6bba`). It fixes a bug in how the Memlets are expanded. The DaCe PR also adds the possibility to generate an error instead of slightly converting Memlets into Maps and this PR enables this feature. --------- Co-authored-by: edopao <edoardo.paone@cscs.ch>
If the DaCe GPU code generator encounters a Memlet that can not be expressed as a
cudaMemcpy*()call, then it converts it to a Map.However, the issue is that these Maps have the wrong iteration order, i.e. wrong memory access pattern and it might not even launch, because of too many blocks in the
ydirection of the compute grid.For this reason GT4Py has to handle these Memlets explicitly.
However, DaCe PR#1976 changed this slightly and thus GT4Py had to follow.
Note, that some of these changes were already introduced by GT4Py PR#2004, however, they were made for the original version of the DaCe PR (and the GT4Py PR had to be merged before the DaCe PR was merged).
Furthermore, this PR fixes a different issue, also related to the expansion of Memlets, which can be found in DaCe PR#2033 (it is not yet merged and currently at commit
19b6bba).It fixes a bug in how the Memlets are expanded.
The DaCe PR also adds the possibility to generate an error instead of slightly converting Memlets into Maps and this PR enables this feature.