Skip to content

Conversation

@gc00
Copy link
Collaborator

@gc00 gc00 commented Sep 2, 2023

The right solution for the second commit is to modify DMTCP so that it will automatically munmap mtcp_restart for its text/data/heap memory after restart. Then MANA doesn't have to worry about it. Until then, this PR should solve the problem, and allow MANA to handle ckpt-restart-ckpt-restart.

This fixes two one bug that occurred in ckpt-restart-ckpt. Please read carefully the commit messages of the two commits (one to fix the bug, and one to print debugging information during MANA ckpt.)

  1. In DMTCP, jalib::XToString seems to have a bug in it. On a second ckpt for wave_mpi.mana.exe, we're saving the address of maxLibsEnd, instead of its value. (See below.)
  2. In MANA, we save the mtcp_restart in the checkpoint image, even though DMTCP can avoid doing this. Then, on the second restart, we try to restore mtcp_restart from the ckpt image on top of the current mtcp_restart, and it fails. This happens because mtcp_restart is always loaded starting at 0x11200000

[ I was wrong about Point 1. Working too late at night. I read GDB wrong, and wrongly attributed the issues to C++. :-( ]

@karya0 ,
I think this intersect with the design of DMTCP. Could it be that normal DMTCP removes mtcp_restart just fine before a second ckpt-restart, but maybe the restart-plugin feature fails to remove mtcp_restart? Could you check this?
It seems to me that the last commit should be something automatically done by DMTCP, and should not rely on MANA.
Thanks.

@gc00 gc00 added the bug Something isn't working label Sep 2, 2023
@gc00 gc00 requested review from jiamingz9925 and karya0 September 2, 2023 08:19
@karya0
Copy link
Collaborator

karya0 commented Sep 2, 2023

@gc00: I am not seeing the issue here:

string maxLibsEndStr = jalib::XToString(maxLibsEnd);
(gdb) p/x maxLibsEnd
$18 = 0x7fa83c8e4000
(gdb) p/x &maxLibsEnd
$19 = 0x7fa639421f68
(gdb) ptype maxLibsEnd
type = void *
(gdb) p &maxLibsEndStr.c_str()
$21 = 0x7fa639421c50 "0x7fa83c8e4000"

Isn't maxLibsEndStr supposed to have the "value" of maxLibsEnd variable, which is 0x7fa83c8e4000? What do you expect it to contain instead?

@gc00
Copy link
Collaborator Author

gc00 commented Sep 2, 2023

@karya0 , Maybe I'm wrong about point #1. I'll check my work again in the morning. But when I review it now, I think probably there is no bug in jalib::XToString. I'll test again in the morning.

So, that leaves only the second point. Normally, DMTCP does not save mtcp_restart in its ckpt image. But for MANA on second ckpt-restart, it seems to have saved mtcp_restart in that second ckpt image. I'm not yet sure why this is happening.

@karya0
Copy link
Collaborator

karya0 commented Sep 2, 2023

2 is happening because MANA restart plugin is asking mtcp_restart to skip unmapping this region. MANA doesn't mark it as lower half region during ckpt so DMTCP saves it as a regular memory region. This is the issue I talked to you about on the phone the other day.

@gc00 gc00 force-pushed the fix-second-restart branch 3 times, most recently from 9632f74 to 834c0a3 Compare September 2, 2023 19:39
@gc00
Copy link
Collaborator Author

gc00 commented Sep 2, 2023

@karya0 wrote:

2 is happening because MANA restart plugin is asking mtcp_restart to skip unmapping this region. MANA doesn't mark it as lower half region during ckpt so DMTCP saves it as a regular memory region.

Thanks. What you say is true. But it's a corollary of a wrong design decision in designing the MTCP restart plugin. Essentially, DMTCP knows where its hole is, and MANA does not. I cannot think of any circumstance where MANA would want to save and restore the mtcp_restart application that is owned by DMTCP. If we agree that MANA never wants to restore this, we should not ask MANA to reverse-engineer to find the DMTCP "hole" and unmap it without the help of DMTCP.

I'll explain here in more detail, since this has already consumed a lot of my time in analyzing it.

If there is no restart plugin, then when DMTCP will automatically unmap the mtcp_restart memory regions. In DMTCP, I can look at the generated ckpt images during first and second ckpt. Those images to not have an mtcp_restart region.

Shouldn't DMTCP follow exactly the same policy when the user has a restart plugin? If a memory region is part of the official DMTCP "hole", then DMTCP should munmap its "hole". Right now, we're expecting MANA to discover where is the DMTCP "hole" (or to look at procmaps to discover the mtcp_restart text/data/heap), and then set mtcp_plugin_skip_memory_region_munmap to detect this special case, and return '0'.

In the newest commit, I am using the value 0x112000000 that mtcp/Makefile defines as the beginning of the hole. And I munmap the hole inside MANA. But DMTCP already knows where the hole is. We shouldn't depend on MANA to reverse-engineer where the DMTCP hole is.

For background:
Inside restart_plugin, MANA creates the lower half and runs MPI_Init before restoring the upper half. Our design requires this. Therefore, after the lower half is created, the mtcp_restart program now contains:
(a) text/data/heap of mtcp_restart
(b) the lower-half created by lh_proxy
(c) new memory regions created by MPI_Init (and sometimes MPI_Init uses ioctl to create them, so that we can't interpose).

Now, we're asking the restart_plugin to distinguish between the text/data/heap of mtcp_restart (not created by MANA) and the extra ioctl memory regions of MPI_Init (not created by MANA).

The easiest way for MANA to distinguish the two is to reverse-engineer where DMTCP placed its hole, and assume that this is mtcp_restart. The harder way is to reverse-engineer MPI_Init, and guess where it did an ioctl. But that type of code would be very fragile as MPI_Init evolves.

CONCLUSION:
By far the cleanest thing to do is for DMTCP to detect its own "hole" and munmap it. Asking MANA to use mtcp_plugin_skip_memory_region_munmap to declare where is the DMTCP "hole" is fragile, since it depends on the 0x11200000 fixed address. And asking MANA to adapt every time DMTCP changes its policy on the "hole" makes no sense. (Of course, DMTCP could add an interface to tell MANA where the DMTCP "hole" is, but that would be really weird.)

Anyway, sorry for the very long comment here. But this is why I'd like DMTCP to unmap its own "hole", and not ask MANA to do this. If the rationale is still not clear, let's talk on the phone. Thanks.

@gc00 gc00 changed the title WIP: Fix second restart Fix second restart Sep 2, 2023
@gc00 gc00 changed the title Fix second restart WIP: Fix second restart Sep 3, 2023
@gc00 gc00 force-pushed the fix-second-restart branch from 834c0a3 to 457942d Compare September 3, 2023 08:48
@gc00
Copy link
Collaborator Author

gc00 commented Sep 3, 2023

@karya0 , Thanks for the phone conversation, where we agreed that DMTCP can take responsibility for doing munmap on the text/data segments of mtcp_restart after the restart begins.

Since then, I've discovered that when mtcp_restart begins at 'main', it has a text/data/heap segment. The call to mtcp_plugin_hook() does not extend the heap. So, the heap is part of mtcp_restart, and should be unmapped -- to avoid an address conflict on a second restart.

@gc00 gc00 force-pushed the fix-second-restart branch 2 times, most recently from 869ae11 to 2cbda9d Compare September 3, 2023 19:28
@gc00 gc00 changed the title WIP: Fix second restart Fix second restart Sep 3, 2023
@gc00 gc00 force-pushed the fix-second-restart branch 2 times, most recently from ec79c6c to 2cba05c Compare September 8, 2023 06:12
gc00 added 2 commits September 8, 2023 02:28
 * This prints the values to easily show the values being passed from
   the time of checkpoint to the time of restart.  It serves the developers,
   and it serves the end users by letting them know that a ckpt happened.
 * It's surrounded by '#if 1' to easily turn this off.  Turning this off
   could be a configure option or mana_launch/restart flag in the future,
   if there's a demand for it.
 * On second checkpoint (ckpt-restart-ckpt), The checkpoint image keeps
   a copy of mtcp_restart in its checkpoint image.
 * On the second restart, we try to restore the mtcp_restart of the
   ckpt image on top of the current mtcp_restart used for restart.
   This then segfaults, due to failure of mmap with MAP_FIXED_NOREPLACE.
 * So now we munmap it within MANA.
@gc00 gc00 force-pushed the fix-second-restart branch from 2cba05c to ad6f904 Compare September 8, 2023 06:29
@gc00
Copy link
Collaborator Author

gc00 commented Sep 13, 2023

When PR #362 is pushed in, it will make this PR unnecessary, and we can then close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants