-
Notifications
You must be signed in to change notification settings - Fork 32
(Closes #3157) Initial implementation of maximal parallel region trans. #3205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3205 +/- ##
=======================================
Coverage 99.95% 99.95%
=======================================
Files 376 378 +2
Lines 53485 53564 +79
=======================================
+ Hits 53463 53542 +79
Misses 22 22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@MetBenjaminWent Chris said you had some cases worth trying this with as functionality tests? |
|
The examples I've been give by to look at are: |
|
Made a bit more progress with this now - there was definitely some missing logic for bdy_impl3 to work. One thing that is apparent that applying things in this way means we result in barriers that live outside parallel regions, which should be purged by @sergisiso @arporter Am I ok to make that change as part of this PR? |
…y on applying inside if/loop nodes
|
I fixed the previous issue, and this has raised some other challenges with the cases I receieved from MO.
I assume we just need to use |
|
Yeah, I'd just been looking at the PR earlier today after I looked at this - I guess its still a while until we'd have it ready, but if it could handle cases like this it would definitely help (assuming the rest is otherwise ok). |
|
I copied this chunk of code into a minimally compiling module: module example_module
implicit none
type :: Dims
integer :: j_start, j_end, i_start, i_end
end type
contains
subroutine sub(tdims, r_rho_levels, dtrdz_charney_grid, fqw, dqw, dqw_nt, gamma2, blm1, omp_block)
type(Dims), intent(inout) :: tdims
integer, intent(inout) :: r_rho_levels(:,:,:)
integer, intent(inout) :: dtrdz_charney_grid(:,:,:)
integer, intent(inout) :: fqw(:,:,:)
integer, intent(inout) :: dqw(:,:,:)
integer, intent(inout) :: dqw_nt(:,:,:)
integer, intent(inout) :: gamma2(:,:)
integer, intent(inout) :: blm1, omp_block
integer :: jj, k, r_sq, rr_sq, l, i, j
do jj = tdims%j_start, tdims%j_end, omp_block
do k = blm1, 2, -1
l = 0
do j = jj, min(jj+omp_block-1, tdims%j_end)
do i = tdims%i_start, tdims%i_end
r_sq = r_rho_levels(i,j,k)*r_rho_levels(i,j,k)
rr_sq = r_rho_levels(i,j,k+1)*r_rho_levels(i,j,k+1)
dqw(i,j,k) = (-dtrdz_charney_grid(i,j,k) * (rr_sq * fqw(i,j,k + 1) - r_sq * fqw(i,j,k)) + dqw_nt(i,j,k)) * gamma2(i,j)
end do
end do
end do
end do
end subroutine
end moduleThe analysis gives the following output: Interestingly, it does take a few seconds to prove that the |
|
I used There are 2 things left to potentially look at.
|
|
@LonelyCat124 Could we finish this PR without the Assignments as there are a few cases that are just contiguous loops that we could already do? And then do the Assignments in a follow up as they seem complicated and worth disucssing separately. |
@sergisiso I'm slightly hesitant since some of the examples I was given by MO do need these (and in fact create wrong code without it) if it were to be applied. I think at least I'd need/want to add a check to see if any I had started implemeting some logic now for Assignment but the next accesses check is somewhat complicated. Edit Though: I think we're also limited without Matthew's #3213 anyway so maybe the assignment checks aren't so urgent. Edit 2: This was my current plan: But the next access check needs to happen in apply when applying the parallel transformation and becomes a bit of a mess I think (since we may end up having to split the parallel regions again at this point). |
@LonelyCat124 I need some context about this issue, can you paste and example here |
|
@LonelyCat124 I thought without assignments it would be encapsulating in a transformation what your already implemented in |
Yeah it would probably be similar, but I was testing it with some code (https://code.metoffice.gov.uk/trac/lfric_apps/browser/main/trunk/science/physics_schemes/source/boundary_layer/bdy_impl3.F90) and we do a lot worse than the manual implementation. I think I'm happy to just go with this simple version initially though and then improve it, probably makes reviewing more straight forward. I'll write some tests for the current functionality and add some stuff to the docstrings. |
@sergisiso An example of one we'd do wrong right now would be any code where have |
|
These last 2 would also be an issue without this transformation if the do the typical |
|
Blocked due to circular dependencies in examples - will update with blocking PR when created. |
No description provided.