reworked task-lists, in PostInititalizationCommunications() #118

jti-lanl · 2022-06-14T22:06:25Z

Refactored task-lists in PhoebusDriver::PostInitializationCommunications(), to avoid a segfault with our 3D-blast-wave input.

PR Summary

There was a comment in the original:

for (int ib = 0; ib < blocks.size(); ib++) {
   ...
   auto &md = pmesh->mesh_data.GetOrAdd(stage_name[0], ib); // ... gives an empty md

At this point in initialization, all the blocks have been allocated in a single stage. Thus, the second iteration finds nothing. This (empty md) produces a SEGV on the 3D-blast-wave input deck.

The code appears to intend the creation of a series of TaskLists in a single Region, each of which (TL) is filled with a series of tasks pertaining to a single block, starting with enabling receives, then performing exchanges, etc. However, the exchanges are actually performed with an entire MeshData object, covering all the blocks (?), which can be handled by a single task.

If that analysis is correct, then the present approach resolves the problem by reformulating the inits into just three TaskLists: The first is a pass through all the blocks, enabling recvs. The second does the MD-wide tasks to do the exchanges. Finally, there's per-block boundary maintenance.

A question to consider is how the existing code was able to function. Perhaps other input decks don't invoke the second iteration? Or do they have initializations that result in per-block stages? If the latter, then the current changes should continue to work for those cases.

Tested with OMP and MPI, but only on A64FX, and only with our 3D_blast_wave input.

Needs a pass through whatever unit tests are available.

PR Checklist

Adds a test for any bugs fixed. Adds tests for new features.
[ X] Format your changes by calling scripts/bash/format.sh.
[ X] Explain what you did.

…ns(). The driver was apparently expecting pmesh->mesh_data.GetOrAdd() would find a distinct stage for each block, but there is only one stage at this point, having the width of Mesh::DefaultPackSize, which matches the number of blocks. Therefore, we split the inits into three TaskRegions, first for the blocks to begin receiving, a second to do MD-wide boundary-buffer exchanges, and a third for the blocks to manage local boundaries.

jti-lanl · 2022-06-14T22:16:37Z

On second thought, the comment about how this "should continue to work" for cases that have allocated per-block stages looks wrong. I don't know if there are such things.

Yurlungur · 2022-06-14T22:20:26Z

This strikes me as a little odd... The 3 task lists are all executed concurrently---you can't expect them to be ordered. Are you relying on that here?

@brryan you wrote this code in the first place, maybe you can take a look?

jti-lanl · 2022-06-14T22:30:50Z

Yes, I wondered about that, but saw that task-lists allow incomplete tasks to be retried, so thought that that would resolve potential deadlock.

However, you help me see that the current implementation doesn't prevent exchanges from starting early. I could fix that with three distinct parallel regions, executed in series. Or perhaps the regional dependency stuff.

Yurlungur · 2022-06-14T22:31:37Z

I think 3 distinct parallel regions is the way to go.

jti-lanl · 2022-06-14T22:33:57Z

Good. I see that. I'll have a shot at that, unless @brryan wants to take over.

brryan · 2022-06-15T03:12:59Z

@jti-lanl Yes this is a substantial problem in this code I wrote, thank you for identifying it! I agree with you and @Yurlungur that three distinct TaskRegions based on how you've already split up the tasks between per-meshblock and per-meshdata work is the way to go. So we're not stepping on each other's toes maybe it's easiest if you just add that to your branch but if you want to pass to me feel free.

The regression tests probably don't do a great job of testing this logic unfortunately, and I'm not immediately sure the best way to make a unit test out of this. I'll open an issue for me to think about this, since this fix should probably go in ASAP since it resolves such a nasty bug.

jti-lanl · 2022-06-15T13:54:23Z

I'm puzzled why the new approach (3 distinct regions) is deadlocking on nv-devkit nodes (but not Akebono, though at first glance they seem to be using the same versions of OpenMPI). I will continue to look. It's fine with me, if someone else sees and fixes the problem.

NOTE: be aware of parthenon-hpc-lab/parthenon#645 Easy to avoid by allocating successive regions only when they are used.

brryan · 2022-06-15T16:35:11Z

@jti-lanl if you push your latest commit I can try to reproduce the locking behavior and take a look. Also, is your 3D blast wave input deck just the 2d blast wave deck from the repository but with parthenon/mesh/nx3 = 128 and parthenon/meshblock/nx3=32?

TBD: Why is there a deadlock, for the 3D blast-wave input? (Not in the patched method, and only on certain hardware.)

jti-lanl · 2022-06-15T19:46:38Z

I've pushed the patch after exploring a little. No deadlock on Akebono but only on devkits (both a64fx), whether configured and built on either platform.

The deadlock doesn't happen in the patched method, but in the simulation proper, just after the cycle 0 output header. It happens reliably and immediately on 4 devkit nodes.

CXXFLAGS="-mcpu=a64fx -msve-vector-bits=scalable -O3 -ggdb" cmake -DPHOEBUS_ENABLE_MPI=ON -DPHOEBUS_ENABLE_OPENMP=ON -DPHOEBUS_ENABLE_HDF5=OFF   -DKokkos_ARCH_A64FX=ON -DKokkos_ENABLE_PROFILING_LOAD_PRINT=ON ..

Our 3d_blast_wave differs from the default 2D, as follows

<parthenon/mesh>       128x128x128 -> 32x32x32
<parthenon/meshblock>  32x32x1 -> 8x8x8
<recon>                mp5 -> weno5

(... plus tlim and file_type)

brryan · 2022-06-16T17:24:13Z

@jti-lanl Thanks, yes I can reproduce this error on 3 or 4 GPUs on a Darwin power9 node. I'm not immediately seeing what could be causing this hang (the code is getting stuck trying to receive MPI messages that never arrive) but I will keep looking.

In the meantime a workaround would be to just comment out driver.PostInitializationCommunication(); in phoebus/src/main.cpp. This routine is used to ensure consistency with the imposed boundary conditions after generating initial data, but that isn't really relevant to the blast wave problem (where we are also still writing the initial data to active + ghost cells of each meshblock in the problem generator so we don't need to call communication before the first cycle). So this problem should work just fine without PostInitializationCommunication.

Yurlungur · 2022-06-27T17:31:55Z

@brryan what's the status of this. Do we have a fix?

jti-lanl added 2 commits June 14, 2022 13:03

Formatted with clang 12.0.1

096be76

Yurlungur requested a review from brryan June 14, 2022 22:20

brryan mentioned this pull request Jun 15, 2022

Unit test for PostInitializationCommunication #119

Open

Separated the 3 init phases into 3 Regions, in PhoebusDriver::PIC()

d2bd33b

TBD: Why is there a deadlock, for the 3D blast-wave input? (Not in the patched method, and only on certain hardware.)

Yurlungur mentioned this pull request Jun 27, 2022

add an input option to short-circuit post-init-comms #121

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

reworked task-lists, in PostInititalizationCommunications() #118

reworked task-lists, in PostInititalizationCommunications() #118

Uh oh!

jti-lanl commented Jun 14, 2022

Uh oh!

jti-lanl commented Jun 14, 2022

Uh oh!

Yurlungur commented Jun 14, 2022

Uh oh!

jti-lanl commented Jun 14, 2022

Uh oh!

Yurlungur commented Jun 14, 2022

Uh oh!

jti-lanl commented Jun 14, 2022

Uh oh!

brryan commented Jun 15, 2022

Uh oh!

jti-lanl commented Jun 15, 2022

Uh oh!

brryan commented Jun 15, 2022

Uh oh!

jti-lanl commented Jun 15, 2022 •

edited

Loading

Uh oh!

brryan commented Jun 16, 2022

Uh oh!

Yurlungur commented Jun 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

reworked task-lists, in PostInititalizationCommunications() #118

Are you sure you want to change the base?

reworked task-lists, in PostInititalizationCommunications() #118

Uh oh!

Conversation

jti-lanl commented Jun 14, 2022

PR Summary

PR Checklist

Uh oh!

jti-lanl commented Jun 14, 2022

Uh oh!

Yurlungur commented Jun 14, 2022

Uh oh!

jti-lanl commented Jun 14, 2022

Uh oh!

Yurlungur commented Jun 14, 2022

Uh oh!

jti-lanl commented Jun 14, 2022

Uh oh!

brryan commented Jun 15, 2022

Uh oh!

jti-lanl commented Jun 15, 2022

Uh oh!

brryan commented Jun 15, 2022

Uh oh!

jti-lanl commented Jun 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brryan commented Jun 16, 2022

Uh oh!

Yurlungur commented Jun 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jti-lanl commented Jun 15, 2022 •

edited

Loading