-
Notifications
You must be signed in to change notification settings - Fork 26
Virtual id refactoring #356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Could we trigger more involved automated testing? |
|
Yes, More involved unit testing would be nice. |
|
Okay, then the other goal is to use this branch as a better base for exampi and openmpi support. @xuyao0127 @JainTwinkle should add |
64562f5 to
567dbd1
Compare
b84a6b3 to
eeeb8a8
Compare
e9a56c5 to
0358def
Compare
6d384c4 to
2dba7c4
Compare
95229dd to
4f1a368
Compare
670cbd3 to
5aacf1f
Compare
Intro
This is a branch of the
dev/gc00/improve-virtualidscode intended for longer-term maintainability, review, and integration into main MANA. It is also intended to be the base for OpenMPI, ExaMPI, etc. development.What's different from dev/gc00/improve-virtualids?
get_vcomm_internalbefore, this is a better solution because the semantics match the old style. Seegrant_ggidand its usages.dev/gc00/improve-virtualidshad slightly different ADD_NEW semantics than main, because the equivalent ofrealIdExists(real_id)would be hacky to write. I've written up how thisrealIdExistsfunctionality might look like, but without a two-level table, it is very bad for big apps, because we cannot iterate through only the descriptors of a single type. Please look at this part, @xuyao0127. If we cannot develop and perfect a two-level table like this by camera-ready for SC23, we could revert to the old behavior. I don't think it caused problems, but it is different. Alternatively, as a workaround we could maintain several maps for the different types as before (but do away with string comparison and template class still).FIXMEannotates points of concern.Otherwise, the branches are the same.
Please, read over the FIXME comments I have written in reviewing, they contain points of interest and important questions/concerns.
Contrib
Unfortunately, this is a pretty big PR. I tried to organize the commits so it is clear what commit is in which phase.
This PR does three main things:
1. Replace the old virtualid machinery
(map from ints to ints, written in a C++ template and using dmtcp's
virtualidtable.h) to a mapping from int to struct pointer, with each struct pointer containing both the Real ID for a some Virtual ID, and any metadata information required. I tried to write this part in a C-style. The ugliest part has to be the macros, but I thought that this was the cleanest way to write the code which depends on the particular MPI types, because they are not defined the same way even in the same implementation. For instance,MPI_Fileis a pointer even in MPICH. It also integrates well with the existing macros. Like @xuyao0127 says, the definition and usage of these macros are kind of strange and obtuse, or at least poorly documented (VIRTUAL_TO_REAL(real) == real). However, changing their behavior would mean changing them literally everywhere, making this PR even bigger. It could be a later project.The struct pointer architecture enables objectives 2 and 3 of the PR. It's also the part that enables portability between MPI implementations (with the exception of one FIXME comment in
init_comm_world: @xuyao0127 , @JainTwinkle , I would appreciate your feedback/work here. I think it would be great if the MANA codebase was truly 100% the same between ExaMPI, OpenMPI, MPICH. Therefore, we need to integrate thelh_constants_mapinto every version.If we only want to demonstrate the capability of checkpoint-restart with multiple MPI (without also improving performance in points 2, 3 and really flexing struct VIDs) this is all we have to test and enhance, just the first commit here.
2. Integrate the old ggid machinery into the system in point 1.
This means that we don't have to look up the same virtual id multiple times. An optimization.
3. Replace the log-replay architecture with a decode-recode architecture.
This works readily for communicators, groups, and operators, because they can all be constructed in one go, when given the proper arguments. It is more difficult for datatypes, because datatypes cannot be "fully deserialized" in the same way. While operators also cannot be deserialized, it is not a problem because there is only way to create a new operator, and operators are not recursively defined, so it is trivial to just hook into operator creation. Datatypes that are "doubly-derived", i.e., user datatypes built using other user datatypes, do not immediately break down into MPI primitives.
One way to solve this problem would be to maintain a dependency tree of datatypes: every wrapper that creates a datatype can record the virtual id, and mark that it is a dependency. Another way may be to implement our own full-deserialization, but this is probably technically difficult. Yet another is to preserve the log-replay system for datatypes exclusively. Either way, the point of this part is to enable faster runtime (we don't need to record MPI calls), faster restart (we don't have to replay MPI calls) at the cost of making checkpoint time slightly slower (we have to record the information for everything that's alive at checkpoint time).
Future
Where to go from here:
lh_constants_map