Description
There may be some places where the unbalanced nature of the parallel kernels perform better with Kokkos Teams, e.g.:
// TODO: For potentially better performance, try using teams instead of MDRangePolicy
Kokkos::parallel_for(
Kokkos::MDRangePolicy<Kokkos::Rank<2>>({0, 0}, {systemsize, systemsize}),
KOKKOS_LAMBDA(const int i, const int j) {
if (j >= i) return;
const double tmp1 = xv0(j) - xv0(i);
const double tmp2 = yv0(j) - yv0(i);
const double r = sqrt(tmp1 * tmp1 + tmp2 * tmp2);
const double tmp3 = C * asin(frac_GridSize_2 / r);
H(i, j) = tmp3;
H(j, i) = tmp3;
});
See https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/HierarchicalParallelism.html#thread-teams for info.
Related Issues and Merge Requests