Skip to content

Conversation

@tob2
Copy link
Collaborator

@tob2 tob2 commented Dec 12, 2025

The code uses 'target teams loop' with 'firstprivate(local)'; however, as 'firstprivate' does not apply to 'loop', it is only privatized at the contention group level and all loop threads share the same 'local' variable, causing races when 'loop' is executed thread parallel.

The solution is to use a workdistribution loop (aka 'for'), which also accepts a 'firstprivate'; as 'teams for' is not permitted, that's 'target teams distribute parallel for'.

Technically, a 'private' would have done as well as the original value is never read, but this patch keeps the 'firstprivate'. Note that while 'loop' accepts neither 'allocate' nor 'firstprivate' (albeit 'private' is accepted), all those clauses are accepted by the workdistribution-loop 'for' construct.

@seyonglee @spophale @andrewkallai — please review.

The code uses 'target teams loop' with 'firstprivate(local)'; however, as
'firstprivate' does not apply to 'loop', it is only privatized at the
contention group level and all loop threads share the same 'local' variable,
causing races when 'loop' is executed thread parallel.

The solution is to use a workdistribution loop (aka 'for'), which also
accepts a 'firstprivate'; as 'teams for' is not permitted, that's
'target teams distribute parallel for'.

Technically, a 'private' would have done as well as the original value is
never read, but this patch keeps the 'firstprivate'. Note that while 'loop'
accepts neither 'allocate' nor 'firstprivate' (albeit 'private' is accepted),
all those clauses are accepted by the workdistribution-loop 'for' construct.
@seyonglee
Copy link
Collaborator

Thank you for catching this issue.
As you said, the loop construct does not allow allocate and firstprivate; therefore, the innermost construct that accepts both allocate and firstprivate/private is the teams construct.
However, changing the target teams loop construct to the target teams distributed parallel for construct will defeat the purpose of this test: testing target teams loop construct with different allocator types.
Therefore, we will have to change local variable from scalar to array, as did in other tests.

// Execute on target
#pragma omp target teams loop map(to: a[0:N], b[0:N]) map(from: c[0:N]) \
#pragma omp target teams distribute parallel for \
map(to: a[0:N], b[0:N]) map(from: c[0:N]) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
map(to: a[0:N], b[0:N]) map(from: c[0:N]) \

}
// Execute on target
#pragma omp target teams loop map(to: a[0:N], b[0:N]) map(from: c[0:N]) \
#pragma omp target teams distribute parallel for \
Copy link
Collaborator

@andrewkallai andrewkallai Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the main pragma body to the following:

#pragma omp target teams loop map(to: a[0:N], b[0:N]) map(from: c[0:N])\
        uses_allocators(omp_default_mem_alloc)\
        allocate(omp_default_mem_alloc: local) firstprivate(local)
  for (int i = 0; i < N; i++) {
    local[i] = a[i] + b[i];
    c[i] = local[i];
  }

Copy link
Collaborator

@andrewkallai andrewkallai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move line 42: OMPVV_TEST_AND_SET_VERBOSE(errors, local[i] != 0);
into the for loop above.

Change the initialization of local to:
int local[N] = {0};

@tob2
Copy link
Collaborator Author

tob2 commented Dec 19, 2025

However, changing the target teams loop construct to the target teams distributed parallel for construct will defeat the purpose of this test: testing target teams loop construct with different allocator types. Therefore, we will have to change local variable from scalar to array, as did in other tests.

But when changing it to 'local[N]', the code will be identical to 'testTargetTeamsAllocateArrayDefaultAlloc', which is also pointless.

How about solving it differently – (first)privatization only at 'teams' level, but only one thread per team/contention group?

Namely:

--- ../OpenMP_VV/tests/5.0/teams_loop/test_target_teams_loop_allocate.c 2025-12-19 13:22:27.556612620 +0100
+++ tests/5.0/teams_loop/test_target_teams_loop_allocate.c      2025-12-19 13:21:12.170932550 +0100
@@ -30,5 +30,5 @@
   // Execute on target
 #pragma omp target teams loop map(to: a[0:N], b[0:N]) map(from: c[0:N]) \
-        uses_allocators(omp_low_lat_mem_alloc) \
+        uses_allocators(omp_low_lat_mem_alloc) thread_limit(1) \
         allocate(omp_low_lat_mem_alloc: local) firstprivate(local)
   for (int i = 0; i < N; i++) {

I have now committed that change.

…ently

Review comments suggested that having the allocator only applicable to 'teams'
and combining with 'loop' is crucial. This new attempt uses 'thread_limit(1)'
to ensure that there is only a single thread per team (contention group),
avoiding a race due to the 'loop' but still permitting to use a scalar that
is only privatized at teams level.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants