Skip to content

Conversation

@benielT
Copy link
Collaborator

@benielT benielT commented Sep 24, 2025

No description provided.

Copy link
Collaborator

@reguly reguly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. However, please try and run the full test suite on an MI300A to make sure all the code paths are covered well

@gihanmudalige
Copy link
Collaborator

@Ashutosh-Londhe do you have access to an MI300A to run our usual test scripts for this ?

@Ashutosh-Londhe
Copy link
Collaborator

Ashutosh-Londhe commented Sep 25, 2025

@Ashutosh-Londhe do you have access to an MI300A to run our usual test scripts for this ?

Sorry i dont have access to MI300A. But seems like benny is trying on COSMA machine so i can request an access to that project which has MPI300A. I will check wtih Beniel about the details.

@benielT
Copy link
Collaborator Author

benielT commented Sep 25, 2025

This looks good to me. However, please try and run the full test suite on an MI300A to make sure all the code paths are covered well

I have access to MI300A and am testing it right now before finalizing the pull request. There are some minor bugs in the current development related to ROCM-SMI power profiling. I will add that change with this pull request as well.

@benielT benielT marked this pull request as ready for review September 25, 2025 19:51
@benielT
Copy link
Collaborator Author

benielT commented Sep 25, 2025

Added all the changes. With these changes, 10% perf improvement on the gnu_ompi combination. The MI300A platform we tested on Archer2 does have ROCM toolchain + OpenMPI, where we saw significant improvement. Unfortunately, I couldn't
replicate similar results in COSMA.

@benielT
Copy link
Collaborator Author

benielT commented Sep 25, 2025

An additional fix commit has been added to resolve the linking error related to power profiling with the rocmSMI APIs

@Ashutosh-Londhe
Copy link
Collaborator

Hi @gihanmudalige I have got the access to MI300A to perform the testing. For Laplace2d and Poisson, HIP version is failing to produce correct result, informed this to Beniel, he is looking into the issue.

@reguly will similar changes will also require for SYCL backend if underline GPU support unified memory? or SYCL doesnt support this feature?

@reguly
Copy link
Collaborator

reguly commented Oct 7, 2025

Yes, ideally with all "offload"-style backends should support this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants