-
Notifications
You must be signed in to change notification settings - Fork 25
Issue 896: Fix race in 5.0/teams_loop/test_target_teams_loop_allocate.c #897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The code uses 'target teams loop' with 'firstprivate(local)'; however, as 'firstprivate' does not apply to 'loop', it is only privatized at the contention group level and all loop threads share the same 'local' variable, causing races when 'loop' is executed thread parallel. The solution is to use a workdistribution loop (aka 'for'), which also accepts a 'firstprivate'; as 'teams for' is not permitted, that's 'target teams distribute parallel for'. Technically, a 'private' would have done as well as the original value is never read, but this patch keeps the 'firstprivate'. Note that while 'loop' accepts neither 'allocate' nor 'firstprivate' (albeit 'private' is accepted), all those clauses are accepted by the workdistribution-loop 'for' construct.
|
Thank you for catching this issue. |
| // Execute on target | ||
| #pragma omp target teams loop map(to: a[0:N], b[0:N]) map(from: c[0:N]) \ | ||
| #pragma omp target teams distribute parallel for \ | ||
| map(to: a[0:N], b[0:N]) map(from: c[0:N]) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| map(to: a[0:N], b[0:N]) map(from: c[0:N]) \ |
| } | ||
| // Execute on target | ||
| #pragma omp target teams loop map(to: a[0:N], b[0:N]) map(from: c[0:N]) \ | ||
| #pragma omp target teams distribute parallel for \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the main pragma body to the following:
#pragma omp target teams loop map(to: a[0:N], b[0:N]) map(from: c[0:N])\
uses_allocators(omp_default_mem_alloc)\
allocate(omp_default_mem_alloc: local) firstprivate(local)
for (int i = 0; i < N; i++) {
local[i] = a[i] + b[i];
c[i] = local[i];
}
andrewkallai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move line 42: OMPVV_TEST_AND_SET_VERBOSE(errors, local[i] != 0);
into the for loop above.
Change the initialization of local to:
int local[N] = {0};
But when changing it to 'local[N]', the code will be identical to 'testTargetTeamsAllocateArrayDefaultAlloc', which is also pointless. How about solving it differently – (first)privatization only at 'teams' level, but only one thread per team/contention group? Namely: --- ../OpenMP_VV/tests/5.0/teams_loop/test_target_teams_loop_allocate.c 2025-12-19 13:22:27.556612620 +0100
+++ tests/5.0/teams_loop/test_target_teams_loop_allocate.c 2025-12-19 13:21:12.170932550 +0100
@@ -30,5 +30,5 @@
// Execute on target
#pragma omp target teams loop map(to: a[0:N], b[0:N]) map(from: c[0:N]) \
- uses_allocators(omp_low_lat_mem_alloc) \
+ uses_allocators(omp_low_lat_mem_alloc) thread_limit(1) \
allocate(omp_low_lat_mem_alloc: local) firstprivate(local)
for (int i = 0; i < N; i++) {I have now committed that change. |
…ently Review comments suggested that having the allocator only applicable to 'teams' and combining with 'loop' is crucial. This new attempt uses 'thread_limit(1)' to ensure that there is only a single thread per team (contention group), avoiding a race due to the 'loop' but still permitting to use a scalar that is only privatized at teams level.
The code uses 'target teams loop' with 'firstprivate(local)'; however, as 'firstprivate' does not apply to 'loop', it is only privatized at the contention group level and all loop threads share the same 'local' variable, causing races when 'loop' is executed thread parallel.
The solution is to use a workdistribution loop (aka 'for'), which also accepts a 'firstprivate'; as 'teams for' is not permitted, that's 'target teams distribute parallel for'.
Technically, a 'private' would have done as well as the original value is never read, but this patch keeps the 'firstprivate'. Note that while 'loop' accepts neither 'allocate' nor 'firstprivate' (albeit 'private' is accepted), all those clauses are accepted by the workdistribution-loop 'for' construct.
@seyonglee @spophale @andrewkallai — please review.