-
Notifications
You must be signed in to change notification settings - Fork 6
Integrate DMCTP into McMini #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
maxwellpirtle
wants to merge
161
commits into
main
Choose a base branch
from
mcmini-with-dmtcp
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit introduces an example program that tests some of the functionality of `volatile_mem_stream`. The changes introduced will be used in McMini later on
This commit adds a partial specialization for `real_world::remote_address<T>` by simply treating it as any normal address.
This commit introduces a "working" example of McMini managing state changes during execution. Here, we simply insert a value into the mailbox and assume that the thread under the control of libmcmini.so properly directed the thread to write into the correct spot. Subsequent commits will add the coordination between libmcmini.so and McMini.
This commit introduces the `runner_mailbox` C struct that will be used to store both the thread-specific payload as well as the pair of binary semaphores that are used to communicate between libmcmini and McMini.
This renaming is more in line with the function of the class as it derives from `std::streambuf` and is not itself a stream.
This commit adds a buffer for storing the mailbox data for each thread that will be spawed under the control of `libmcmini.so`
This commit introduces synchronization between libmcmini and the McMini process using the pair of binary semaphores in shared memory. This is equivalent to the current model. However, there is one subtlety. If execvp fails, we currently block indefinitely as the the error is not propagated to the the parent McMini process. We can resolve this using a pipe(2) with FD_CLOEXEC. This will be introduced later
This commit introduces a pipe(2) call used to communicate errno codes from execvp in the forked child process to the parent process. The parent uses the presence of a value as an indication of error. When the child successfully runs `execvp()`, and since the write end of the pipe has been marked `FD_CLOEXEC`, the parent will return from `read(2)` with a value of 0 and will continue execution as normal
This commit prepares the `transition` class to include an `executor` of type `runner_id_t`. Each transition is represented as the application of a partial function of a particular runner. We adopt the abstraction used in Abdulla et al. to represent transitions.
This commit introduces methods for managing runners in states. Transitions operate on states and states alone; but they are themselves executed by runners. It is useful, and necessary sometimes (e.g. with the thread_start transition), to identify the modeled states of these runners in the model via the runner id instead of the object id directly. Here we note that it is sensible to add this mapping directly to the state: the transition that these runners are executing is outside the scope of the state, but simply tracking which objects they refer to is sensible.
This commit adds a memcpy function for volatile memory. This ensures that the compile properly analyzes the volatile memory copy that the standard `memcpy` does not enforce.
This commit introduces changes to how the elements in the state_sequence are informed of runners. Previously, elements were not informed of changes to the number of runners and thus were not properly synchronizes with the changes forwarded to the `state_sequence`. This commit addresses that issue. Additionally, the use of the `volatile_mem_stream` has been removed. It turns out to be needlessly complicated and does not provide much over what a simply dereference from a raw pointer doesn't itself already achieve (and is probably much slower anyway).
This commit wraps `main` in a try-catch to ensure that all exceptions that make it all the way to the top level do not result in a call to `std::terminate()`. This ensures that proper destruction of static global variables occurs even when an exception remains unhandled throughout the entire call stack (save for main).
This commit introduces all of the steps necessary in state regeneration on the model side. Some work remains to accomplish the equivalent on the real-world side
As in the stable version of McMini, we handle the _main_ thread returning from the `main` function as an `atexit()` handler which block indefinitely. This ensures that McMini still has control over the process as a model checker
C++ destroys global static objects upon the program exiting normally, i.e. with `exit(2)`. `fork_process_source` produces new processes by `fork(2)`-ing the McMini process and attempting to `execvp(2)` into some target process to create a template. This is all fine when the execvp() succeeds; however, on failure, we need be careful. Since we forked the C++ McMini process, if we simply call `exit(2)` in the child process, this will invoke the destructors for the C++ global static objects _which reside in the address space of the child_. This is unexpected though: the child in-between should not perform any clean up (that's the job of the parent) and is only designed to immediately exec() or else report an error via its pipe. The solution is to use `quick_exit()` in this temporary child which does not perform the normal clean up
This commit adds a `sigaction(2)` call to ignore the SIGCHLD signal sent to the template process. This means that the zombies of the children will not be left around. Handling segfaults and other issues will be a little more difficult though, as the children of the template are NOT children of the McMini process that contains the model
Apparently exit(2) is not safe to call from multiple threads. To avoid this, we simply wrap a `pthread_mutex_t` around the call in a call to `exit_mt()`
This commit resolves a subtle issue with the way the architecture changes to McMini affected the plumbing details in `libmcmini.so`. When `libmcmini.so` is in the "template" mode, it is signaled by the `McMini` process to create a new child process. When McMini no longer needs the child process, it will now send a signal to the child process the template created previously by calling `fork(2)`. However, signal delivery is _asynchronous_, which means that exiting from the clean up in the destructor of `local_linux_process` does _not_ necessarily mean that the process itself has died, even though the destructor calls `kill(2)`. To resolve the issue, we effectively clump together the action of "forking the template to produce a new child process" with "waiting for the previous child process to terminate." This is accomplished with a call to `wait(2)` right before the `sem_wait(2)` call in `mc_template_process_loop_forever()`. This has the advantage that the C++ code in the McMini process is already designed to treat the creation of a new child as asynchronous.
The `lib` directory contains old code while the `libmcmini` directory contains the new code. We will change the name in a subsequent commit
`multithreaded_fork()` makes use of different semaphores. Since McMini now intercepts `sem_*`, we need to change locations where `sem_wait` was used directly when we want to call into libpthread.
The multi-threaded fork implementation uses `pthread_cond_wait()` and `pthread_cond_broadcast()`. Since we override these now, we need to be careful and call the correct `libpthread` function directly or else risk executing back inside McMini
The McMini deep debugging architecture involves several processes. In particular, there is a separate process containing the model checker and one used to repeatedly create new branch processes via multithreaded_fork(). This means that branch processes are _grandchildren_ of the McMini model checker process. To enable the McMini model checker process to handle unexpected errors in the child process directly (e.g. assertion failures or SEGFAULTs that occur in the branch process during exploration), we set the McMini model checker process as the child's "subreaper" via `prctl(3)` and the `PR_SET_CHILD_SUBREAPER` flag. Refer to `local_linux_process.cpp` and `loop.c` for details
If during recording some threads have exited, the current code assumes that the exited threads will still have a valid "next transition." Instead, if we know a thread has already exited, we simply give it the "exit" transition directly instead
This commit introduces support for assertion failures and segmentation faults inside the branch process or inside a template process during deep debugging. The challenge in supporting detecting these errors is that errors in branch processes are seen inside the template process and are not reported directly to the `mcmini` process because the `mcmini` process is a _grandparent_ of the branch process. We attempted to circumvent this using `prctl(PR_SETSUBREAPER, 1)` which ensures that all descendants of the given process become the de facto parent when the parent of any descendant dies. However, since the template process, the parent of the branch process, does not die, it's the template process which receives a SIGCHLD when the branch process exits (unexpectedly or expectedly). The template therefore must then notify the `mcmini` process that something went wrong. We do this by using the `kill(2)` system call inside the SIGCHLD handler of the template process. Fortunately, the `kill(2)` system call is async-signal-safe, so it's safe to use in signal handlers
This commit adds two features: 1. It removes the SIGCHLD handler installed by the template process from the branch process. A future commit should instead reinstall any SIGCHLD handler installed by the target process before checkpoint. 2. It moves stat tracking into the model checker algorithm instead of callbacks. The latter was a preivous hack for counting e.g. traces
The McMini process lacked logging capabilities. This commit introduces logging into the McMini process to enable more detailed traces when debugging.
There's a subtle bug involving multiple consecutive calls to multithreaded_fork() that was introduced when we enabled SIGCHLD callbacks inside the template process to detect abnormal process termination at runtime. This commit adds a change to resolve a race condition that occurred whenever the template thread was restored by `dmtcp_restart` BEFORE a userspace thread. The problem was that the `threadIdx` would not be incremented by the template thread. This meant that a userspace thread could compute the same index as the template thread which wrote a `0` into `origTid` used later to indicate that the given thread should be skipped for a `clone(2)` call.
A `sem_post()` was getting mixed up between two multithreaded_fork() calls (one immediately following the other). Essentially, a child thread in the template process appeared to not have exited its SIG_MULTITHREADED_FORK handler before the second multithreaded_fork() call came in. The child thread would have been calling `sem_wait()`. We suspect the bug occurs because we receive a signal inside the signal handler while calling `sem_wait()`, which is known not to be async-signal-safe. A workaround is to use the self-pipe trick instead of sem_wait. For now, we add an additional sem_post/sem_wait such that a signal sent via tgkill during the next multithreaded_fork call can only occur when a child thread is calling `sem_post()`, which is async-signal-safe.
`sem_wait()` isn't async-signal-safe, so calling it within a signal handler may cause unexpected bugs. Such an unexpected bug was observed in prior commits: receiving a signal when calling `sem_wait` inside of the signal handler caused unexpected behavior. Fortunately, we can avoid calling `sem_wait()` in the handler. The key idea is that the parent thread calling `multithreaded_fork()` is only waiting for the child threads to call `getcontext()`. Once this happens, the parent thread can simply call `fork()`. The cloned parent thread in the child process will use the contexts saved from before to restore the userspace threads. Consequently, the `sem_wait()` can be removed. Even better would be to use the deep debugging system along `multithreaded_fork()` and avoid the signalling entirely. This is the work of subsequent commits
The existing `multithreaded_fork()` implementation does not assume a particular location in which it will be called: userspace threads are assumed to be possibly executing in different locations in the code. However, in deep debugging, userspace threads will always be executing in a known location: `thread_handle_after_dmtcp_restart()`. Using this, we can amortize the cost of getcontext() and writing the pthread descriptors into a global storage at the "getting into a stable state phase". The idea is that the context is all that really matters to successfully execute multithreaded_fork(). With this knowledge in mind, the `fast_multithreaded_fork()` is introduced which simply reads from global memory the appropriate contexts inside the child process. Otherwise, it is conceptually very close to fork().
Raw `pthread_t` structs were passed in shared memory between the McMini process and the branch processes after multithreaded_fork(). However, this is suspect because although the `pthread_t` struct is patched between `multithreaded_fork()` calls, it may not be exactly "equal" copy-wise. Instead, we now keep track of a light-weight mapping of pthread_t to runner IDs. The higher-level goal is to be able to track objects persistently across traces despite the fact that these objects will appear in (hundreds) of traces.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.