Skip to content

Message Logging and Inline Recovery#110

Draft
Matthew-Whitlock wants to merge 14 commits intompi_coroutinesfrom
logging
Draft

Message Logging and Inline Recovery#110
Matthew-Whitlock wants to merge 14 commits intompi_coroutinesfrom
logging

Conversation

@Matthew-Whitlock
Copy link
Collaborator

@Matthew-Whitlock Matthew-Whitlock commented Feb 6, 2026

Depends on the mpi_coroutines branch. See examples/08_inline_recovery/stencil.cpp for an example using message logging and inline recovery.

TODO:

  • Proper integration into Fenix. Something like Fenix_Mlog_create(int log_id, MPI_Comm comm), etc.
  • Support collectives. Collective consistency checking already exists, it just needs a bit more to actually log and replay the collectives. Lots of future work here, but for now just do basic support for barriers and reductions/allreductions/bcasts
  • Switch to forming consistency only during the actual reset_consistency call as a blocking operation. It still needs the async mpi_coroutines stuff, but do a barrier after forming all consistency so we don't have to IProbe for every application MPI operation. It also just simplifies the logic.
  • Separate the MPI override functions as a different library module that can be linked with CMake, so applications can enable/disable based on if they are using message logging.


/**
* @brief Serialize a group member's data into the member's local store.
*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Linter] reported by reviewdog 🐶

Suggested change
*
*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant