Integrate DMCTP into McMini #132

maxwellpirtle · 2024-07-30T19:58:57Z

No description provided.

This commit introduces an example program that tests some of the functionality of `volatile_mem_stream`. The changes introduced will be used in McMini later on

This commit adds a partial specialization for `real_world::remote_address<T>` by simply treating it as any normal address.

This commit introduces a "working" example of McMini managing state changes during execution. Here, we simply insert a value into the mailbox and assume that the thread under the control of libmcmini.so properly directed the thread to write into the correct spot. Subsequent commits will add the coordination between libmcmini.so and McMini.

This commit introduces the `runner_mailbox` C struct that will be used to store both the thread-specific payload as well as the pair of binary semaphores that are used to communicate between libmcmini and McMini.

This renaming is more in line with the function of the class as it derives from `std::streambuf` and is not itself a stream.

This commit adds a buffer for storing the mailbox data for each thread that will be spawed under the control of `libmcmini.so`

This commit introduces synchronization between libmcmini and the McMini process using the pair of binary semaphores in shared memory. This is equivalent to the current model. However, there is one subtlety. If execvp fails, we currently block indefinitely as the the error is not propagated to the the parent McMini process. We can resolve this using a pipe(2) with FD_CLOEXEC. This will be introduced later

This commit introduces a pipe(2) call used to communicate errno codes from execvp in the forked child process to the parent process. The parent uses the presence of a value as an indication of error. When the child successfully runs `execvp()`, and since the write end of the pipe has been marked `FD_CLOEXEC`, the parent will return from `read(2)` with a value of 0 and will continue execution as normal

This commit prepares the `transition` class to include an `executor` of type `runner_id_t`. Each transition is represented as the application of a partial function of a particular runner. We adopt the abstraction used in Abdulla et al. to represent transitions.

This commit introduces methods for managing runners in states. Transitions operate on states and states alone; but they are themselves executed by runners. It is useful, and necessary sometimes (e.g. with the thread_start transition), to identify the modeled states of these runners in the model via the runner id instead of the object id directly. Here we note that it is sensible to add this mapping directly to the state: the transition that these runners are executing is outside the scope of the state, but simply tracking which objects they refer to is sensible.

This commit adds a memcpy function for volatile memory. This ensures that the compile properly analyzes the volatile memory copy that the standard `memcpy` does not enforce.

This commit introduces changes to how the elements in the state_sequence are informed of runners. Previously, elements were not informed of changes to the number of runners and thus were not properly synchronizes with the changes forwarded to the `state_sequence`. This commit addresses that issue. Additionally, the use of the `volatile_mem_stream` has been removed. It turns out to be needlessly complicated and does not provide much over what a simply dereference from a raw pointer doesn't itself already achieve (and is probably much slower anyway).

This commit wraps `main` in a try-catch to ensure that all exceptions that make it all the way to the top level do not result in a call to `std::terminate()`. This ensures that proper destruction of static global variables occurs even when an exception remains unhandled throughout the entire call stack (save for main).

This commit introduces all of the steps necessary in state regeneration on the model side. Some work remains to accomplish the equivalent on the real-world side

As in the stable version of McMini, we handle the _main_ thread returning from the `main` function as an `atexit()` handler which block indefinitely. This ensures that McMini still has control over the process as a model checker

C++ destroys global static objects upon the program exiting normally, i.e. with `exit(2)`. `fork_process_source` produces new processes by `fork(2)`-ing the McMini process and attempting to `execvp(2)` into some target process to create a template. This is all fine when the execvp() succeeds; however, on failure, we need be careful. Since we forked the C++ McMini process, if we simply call `exit(2)` in the child process, this will invoke the destructors for the C++ global static objects _which reside in the address space of the child_. This is unexpected though: the child in-between should not perform any clean up (that's the job of the parent) and is only designed to immediately exec() or else report an error via its pipe. The solution is to use `quick_exit()` in this temporary child which does not perform the normal clean up

This commit adds a `sigaction(2)` call to ignore the SIGCHLD signal sent to the template process. This means that the zombies of the children will not be left around. Handling segfaults and other issues will be a little more difficult though, as the children of the template are NOT children of the McMini process that contains the model

Apparently exit(2) is not safe to call from multiple threads. To avoid this, we simply wrap a `pthread_mutex_t` around the call in a call to `exit_mt()`

This commit resolves a subtle issue with the way the architecture changes to McMini affected the plumbing details in `libmcmini.so`. When `libmcmini.so` is in the "template" mode, it is signaled by the `McMini` process to create a new child process. When McMini no longer needs the child process, it will now send a signal to the child process the template created previously by calling `fork(2)`. However, signal delivery is _asynchronous_, which means that exiting from the clean up in the destructor of `local_linux_process` does _not_ necessarily mean that the process itself has died, even though the destructor calls `kill(2)`. To resolve the issue, we effectively clump together the action of "forking the template to produce a new child process" with "waiting for the previous child process to terminate." This is accomplished with a call to `wait(2)` right before the `sem_wait(2)` call in `mc_template_process_loop_forever()`. This has the advantage that the C++ code in the McMini process is already designed to treat the creation of a new child as asynchronous.

The `lib` directory contains old code while the `libmcmini` directory contains the new code. We will change the name in a subsequent commit

`multithreaded_fork()` makes use of different semaphores. Since McMini now intercepts `sem_*`, we need to change locations where `sem_wait` was used directly when we want to call into libpthread.

The multi-threaded fork implementation uses `pthread_cond_wait()` and `pthread_cond_broadcast()`. Since we override these now, we need to be careful and call the correct `libpthread` function directly or else risk executing back inside McMini

The McMini deep debugging architecture involves several processes. In particular, there is a separate process containing the model checker and one used to repeatedly create new branch processes via multithreaded_fork(). This means that branch processes are _grandchildren_ of the McMini model checker process. To enable the McMini model checker process to handle unexpected errors in the child process directly (e.g. assertion failures or SEGFAULTs that occur in the branch process during exploration), we set the McMini model checker process as the child's "subreaper" via `prctl(3)` and the `PR_SET_CHILD_SUBREAPER` flag. Refer to `local_linux_process.cpp` and `loop.c` for details

If during recording some threads have exited, the current code assumes that the exited threads will still have a valid "next transition." Instead, if we know a thread has already exited, we simply give it the "exit" transition directly instead

This commit introduces support for assertion failures and segmentation faults inside the branch process or inside a template process during deep debugging. The challenge in supporting detecting these errors is that errors in branch processes are seen inside the template process and are not reported directly to the `mcmini` process because the `mcmini` process is a _grandparent_ of the branch process. We attempted to circumvent this using `prctl(PR_SETSUBREAPER, 1)` which ensures that all descendants of the given process become the de facto parent when the parent of any descendant dies. However, since the template process, the parent of the branch process, does not die, it's the template process which receives a SIGCHLD when the branch process exits (unexpectedly or expectedly). The template therefore must then notify the `mcmini` process that something went wrong. We do this by using the `kill(2)` system call inside the SIGCHLD handler of the template process. Fortunately, the `kill(2)` system call is async-signal-safe, so it's safe to use in signal handlers

This commit adds two features: 1. It removes the SIGCHLD handler installed by the template process from the branch process. A future commit should instead reinstall any SIGCHLD handler installed by the target process before checkpoint. 2. It moves stat tracking into the model checker algorithm instead of callbacks. The latter was a preivous hack for counting e.g. traces

The McMini process lacked logging capabilities. This commit introduces logging into the McMini process to enable more detailed traces when debugging.

There's a subtle bug involving multiple consecutive calls to multithreaded_fork() that was introduced when we enabled SIGCHLD callbacks inside the template process to detect abnormal process termination at runtime. This commit adds a change to resolve a race condition that occurred whenever the template thread was restored by `dmtcp_restart` BEFORE a userspace thread. The problem was that the `threadIdx` would not be incremented by the template thread. This meant that a userspace thread could compute the same index as the template thread which wrote a `0` into `origTid` used later to indicate that the given thread should be skipped for a `clone(2)` call.

A `sem_post()` was getting mixed up between two multithreaded_fork() calls (one immediately following the other). Essentially, a child thread in the template process appeared to not have exited its SIG_MULTITHREADED_FORK handler before the second multithreaded_fork() call came in. The child thread would have been calling `sem_wait()`. We suspect the bug occurs because we receive a signal inside the signal handler while calling `sem_wait()`, which is known not to be async-signal-safe. A workaround is to use the self-pipe trick instead of sem_wait. For now, we add an additional sem_post/sem_wait such that a signal sent via tgkill during the next multithreaded_fork call can only occur when a child thread is calling `sem_post()`, which is async-signal-safe.

`sem_wait()` isn't async-signal-safe, so calling it within a signal handler may cause unexpected bugs. Such an unexpected bug was observed in prior commits: receiving a signal when calling `sem_wait` inside of the signal handler caused unexpected behavior. Fortunately, we can avoid calling `sem_wait()` in the handler. The key idea is that the parent thread calling `multithreaded_fork()` is only waiting for the child threads to call `getcontext()`. Once this happens, the parent thread can simply call `fork()`. The cloned parent thread in the child process will use the contexts saved from before to restore the userspace threads. Consequently, the `sem_wait()` can be removed. Even better would be to use the deep debugging system along `multithreaded_fork()` and avoid the signalling entirely. This is the work of subsequent commits

The existing `multithreaded_fork()` implementation does not assume a particular location in which it will be called: userspace threads are assumed to be possibly executing in different locations in the code. However, in deep debugging, userspace threads will always be executing in a known location: `thread_handle_after_dmtcp_restart()`. Using this, we can amortize the cost of getcontext() and writing the pthread descriptors into a global storage at the "getting into a stable state phase". The idea is that the context is all that really matters to successfully execute multithreaded_fork(). With this knowledge in mind, the `fast_multithreaded_fork()` is introduced which simply reads from global memory the appropriate contexts inside the child process. Otherwise, it is conceptually very close to fork().

Raw `pthread_t` structs were passed in shared memory between the McMini process and the branch processes after multithreaded_fork(). However, this is suspect because although the `pthread_t` struct is patched between `multithreaded_fork()` calls, it may not be exactly "equal" copy-wise. Instead, we now keep track of a light-weight mapping of pthread_t to runner IDs. The higher-level goal is to be able to track objects persistently across traces despite the fact that these objects will appear in (hundreds) of traces.

maxwellpirtle added 30 commits June 5, 2024 18:06

Ensure volatile_mem_stream works with an example

fcc23e9

This commit introduces an example program that tests some of the functionality of `volatile_mem_stream`. The changes introduced will be used in McMini later on

Remove mcmini/detail folder and favor misc

7f12a8a

Ensure remote_address is hashable

45147e6

This commit adds a partial specialization for `real_world::remote_address<T>` by simply treating it as any normal address.

Add runner_mailbox C struct in preparation for libmcmini

3ed051f

This commit introduces the `runner_mailbox` C struct that will be used to store both the thread-specific payload as well as the pair of binary semaphores that are used to communicate between libmcmini and McMini.

Rename volatile_mem_stream to *_streambuf

8bd7b2f

This renaming is more in line with the function of the class as it derives from `std::streambuf` and is not itself a stream.

Prepare mailbox storage in local_linux_process

23d2f7f

This commit adds a buffer for storing the mailbox data for each thread that will be spawed under the control of `libmcmini.so`

Add implementation for libmcmini.so for initial setup

f6ebe78

Reduce the common code from state_sequence and detached_state

4c82d9c

Add thread transitions modeling

7b53dd8

Add dlsym loading of pthread symbols in libmcmini.so

9a66aca

Add memcpy_v for volatile memory copy

1b94b6c

This commit adds a memcpy function for volatile memory. This ensures that the compile properly analyzes the volatile memory copy that the standard `memcpy` does not enforce.

Add explicit type field to runner_mailbox

04beb31

Add signal_tracker for process forking

b4e1fcb

Implement state regeneration in coordinator

977b0a8

This commit introduces all of the steps necessary in state regeneration on the model side. Some work remains to accomplish the equivalent on the real-world side

Handle main thread exiting the program

dd81d9a

As in the stable version of McMini, we handle the _main_ thread returning from the `main` function as an `atexit()` handler which block indefinitely. This ensures that McMini still has control over the process as a model checker

Add multi-threaded exit(2) call

b37de21

Apparently exit(2) is not safe to call from multiple threads. To avoid this, we simply wrap a `pthread_mutex_t` around the call in a call to `exit_mt()`

Add the ability to raise an exception is a signal is captured

3599152

Handle fork(2) failiing in the template on the model side

2dbb0b5

Remove current lib directory

0e1cf71

The `lib` directory contains old code while the `libmcmini` directory contains the new code. We will change the name in a subsequent commit

Move libmcmini to lib

9766a29

maxwellpirtle added 30 commits July 23, 2025 12:25

Use explicit call in multithreaded_fork()

030215e

`multithreaded_fork()` makes use of different semaphores. Since McMini now intercepts `sem_*`, we need to change locations where `sem_wait` was used directly when we want to call into libpthread.

Add sem_destroy callback/impl

a36c338

Add defaults for transitions/dependency relations

9b29a6c

Add misc changes to support semaphores

1d89d19

Add small comment cleanups/internal error message

4e7f056

Move mc_template_process_loop_forever() to loop.c

5db0e56

Add support for abort()/exit() calls (expected aborts/exits)

cbfc0aa

Remove stray config files

9bd3e55

Add support for varargs with logging

11b41ac

Change modeled semaphore to use int (not unsigned)

b166bc0

Add explicit support for exit(EXIT_FAILURE)/abort()

c08b9d8

Temporarily remove PR_SETSUBREAPER code (assertions only)

7e27a7d

Awake scheduler in the event of an abort

9a77dde

Handle already-exited threads in McMini

ede8efb

If during recording some threads have exited, the current code assumes that the exited threads will still have a valid "next transition." Instead, if we know a thread has already exited, we simply give it the "exit" transition directly instead

Re-add wait(NULL) while we don't have prctl

41e32c1

Add test directory

799d4c6

Add light logging functionality to McMini

b335ed4

The McMini process lacked logging capabilities. This commit introduces logging into the McMini process to enable more detailed traces when debugging.

Add logging level flag to mcmini

5eb6b0f

Fix small bug without deep debugging

bf4e2af

Move model checking relations into config

9c49b10

Remove handling of SIGTSTOP and prctl subreaper

1a86b9b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate DMCTP into McMini #132

Integrate DMCTP into McMini #132

Uh oh!

maxwellpirtle commented Jul 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Integrate DMCTP into McMini #132

Are you sure you want to change the base?

Integrate DMCTP into McMini #132

Uh oh!

Conversation

maxwellpirtle commented Jul 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants