Skip to content

Conversation

@maxwellpirtle
Copy link
Collaborator

No description provided.

This commit introduces an example program that
tests some of the functionality of `volatile_mem_stream`.
The changes introduced will be used in McMini later on
This commit adds a partial specialization for
`real_world::remote_address<T>` by simply treating
it as any normal address.
This commit introduces a "working" example of
McMini managing state changes during execution.
Here, we simply insert a value into the mailbox
and assume that the thread under the control of libmcmini.so
properly directed the thread to write into the correct spot.
Subsequent commits will add the coordination between
libmcmini.so and McMini.
This commit introduces the `runner_mailbox` C struct
that will be used to store both the thread-specific payload
as well as the pair of binary semaphores that are used
to communicate between libmcmini and McMini.
This renaming is more in line with the function
of the class as it derives from `std::streambuf` and
is not itself a stream.
This commit adds a buffer for storing the mailbox
data for each thread that will be spawed under the control
of `libmcmini.so`
This commit introduces synchronization between libmcmini and
the McMini process using the pair of binary semaphores in
shared memory. This is equivalent to the current model.

However, there is one subtlety. If execvp fails, we currently
block indefinitely as the the error is not propagated to the
the parent McMini process. We can resolve this using a pipe(2)
with FD_CLOEXEC. This will be introduced later
This commit introduces a pipe(2) call used to
communicate errno codes from execvp in the forked child
process to the parent process. The parent uses the
presence of a value as an indication of error. When
the child successfully runs `execvp()`, and since the
write end of the pipe has been marked `FD_CLOEXEC`, the
parent will return from `read(2)` with a value of 0
and will continue execution as normal
This commit prepares the `transition` class
to include an `executor` of type `runner_id_t`.
Each transition is represented as the application
of a partial function of a particular runner.
We adopt the abstraction used in Abdulla et al.
to represent transitions.
This commit introduces methods for managing runners
in states. Transitions operate on states and states
alone; but they are themselves executed by runners.
It is useful, and necessary sometimes (e.g. with the
thread_start transition), to identify the modeled states
of these runners in the model via the runner id instead
of the object id directly. Here we note that it is
sensible to add this mapping directly to the state:
the transition that these runners are executing is
outside the scope of the state, but simply tracking which
objects they refer to is sensible.
This commit adds a memcpy function for volatile memory.
This ensures that the compile properly analyzes the volatile
memory copy that the standard `memcpy` does not enforce.
This commit introduces changes to how the elements
in the state_sequence are informed of runners. Previously,
elements were not informed of changes to the number of
runners and thus were not properly synchronizes with the
changes forwarded to the `state_sequence`. This commit
addresses that issue.

Additionally, the use of the `volatile_mem_stream` has been
removed. It turns out to be needlessly complicated and does
not provide much over what a simply dereference from a raw
pointer doesn't itself already achieve (and is probably much
slower anyway).
This commit wraps `main` in a try-catch to ensure that
all exceptions that make it all the way to the top level
do not result in a call to `std::terminate()`. This ensures
that proper destruction of static global variables occurs
even when an exception remains unhandled throughout the entire
call stack (save for main).
This commit introduces all of the
steps necessary in state regeneration on the
model side. Some work remains to accomplish the
equivalent on the real-world side
As in the stable version of McMini, we handle
the _main_ thread returning from the `main`
function as an `atexit()` handler which block indefinitely.
This ensures that McMini still has control over the process
as a model checker
C++ destroys global static objects upon the program exiting
normally, i.e. with `exit(2)`. `fork_process_source` produces
new processes by `fork(2)`-ing the McMini process and attempting
to `execvp(2)` into some target process to create a template. This
is all fine when the execvp() succeeds; however, on failure, we need
be careful. Since we forked the C++ McMini process, if we simply call
`exit(2)` in the child process, this will invoke the destructors for
the C++ global static objects _which reside in the address space of
the child_. This is unexpected though: the child in-between should not
perform any clean up (that's the job of the parent)  and is only designed to
immediately exec() or else report an error via its pipe.

The solution is to use `quick_exit()` in this temporary child which does not perform the
normal clean up
This commit adds a `sigaction(2)` call to
ignore the SIGCHLD signal sent to the template
process. This means that the zombies of the children
will not be left around. Handling segfaults and other
issues will be a little more difficult though,
as the children of the template are NOT children
of the McMini process that contains the model
Apparently exit(2) is not safe to call from
multiple threads. To avoid this, we simply
wrap a `pthread_mutex_t` around the call in a call
to `exit_mt()`
This commit resolves a subtle issue with the way the
architecture changes to McMini affected the plumbing details
in `libmcmini.so`. When `libmcmini.so` is in the "template"
mode, it is signaled by the `McMini` process to create a new
child process. When McMini no longer needs the child process,
it will now send a signal to the child process the template
created previously by calling `fork(2)`. However, signal delivery
is _asynchronous_, which means that exiting from the clean up
in the destructor of `local_linux_process` does _not_ necessarily
mean that the process itself has died, even though the destructor
calls `kill(2)`.

To resolve the issue, we effectively clump together the action of
"forking the template to produce a new child process" with
"waiting for the previous child process to terminate." This is
accomplished with a call to `wait(2)` right before the `sem_wait(2)`
call in `mc_template_process_loop_forever()`. This has the advantage
that the C++ code in the McMini process is already designed to treat
the creation of a new child as asynchronous.
The `lib` directory contains old code while
the `libmcmini` directory contains the new code.
We will change the name in a subsequent commit
`multithreaded_fork()` makes use of different semaphores. Since McMini
now intercepts `sem_*`, we need to change locations where `sem_wait`
was used directly when we want to call into libpthread.
The multi-threaded fork implementation uses `pthread_cond_wait()` and
`pthread_cond_broadcast()`. Since we override these now, we need to be
careful and call the correct `libpthread` function directly or else risk
executing back inside McMini
The McMini deep debugging architecture involves several processes. In
particular, there is a separate process containing the model checker and
one used to repeatedly create new branch processes via
multithreaded_fork(). This means that branch processes are
_grandchildren_ of the McMini model checker process.

To enable the McMini model checker process to handle unexpected errors
in the child process directly (e.g. assertion failures or SEGFAULTs that
occur in the branch process during exploration), we set the McMini model
checker process as the child's "subreaper" via `prctl(3)` and the
`PR_SET_CHILD_SUBREAPER` flag. Refer to `local_linux_process.cpp` and
`loop.c` for details
If during recording some threads have exited, the current code assumes
that the exited threads will still have a valid "next transition."
Instead, if we know a thread has already exited, we simply give it the
"exit" transition directly instead
This commit introduces support for assertion failures and segmentation
faults inside the branch process or inside a template process during
deep debugging. The challenge in supporting detecting these errors is
that errors in branch processes are seen inside the template process and
are not reported directly to the `mcmini` process because the `mcmini`
process is a _grandparent_ of the branch process. We attempted to
circumvent this using `prctl(PR_SETSUBREAPER, 1)` which ensures that all
descendants of the given process become the de facto parent when the
parent of any descendant dies. However, since the template process, the
parent of the branch process, does not die, it's the template process
which receives a SIGCHLD when the branch process exits (unexpectedly or
expectedly). The template therefore must then notify the `mcmini`
process that something went wrong. We do this by using the `kill(2)`
system call inside the SIGCHLD handler of the template process.
Fortunately, the `kill(2)` system call is async-signal-safe, so it's
safe to use in signal handlers
This commit adds two features:

1. It removes the SIGCHLD handler installed by the template process from
   the branch process. A future commit should instead reinstall any
   SIGCHLD handler installed by the target process before checkpoint.
2. It moves stat tracking into the model checker algorithm instead of
   callbacks. The latter was a preivous hack for counting e.g. traces
The McMini process lacked logging capabilities. This commit introduces
logging into the McMini process to enable more detailed traces when
debugging.
There's a subtle bug involving multiple consecutive calls to
multithreaded_fork() that was introduced when we enabled SIGCHLD
callbacks inside the template process to detect abnormal process
termination at runtime. This commit adds a change to resolve a race
condition that occurred whenever the template thread was restored by
`dmtcp_restart` BEFORE a userspace thread. The problem was that the
`threadIdx` would not be incremented by the template thread. This meant
that a userspace thread could compute the same index as the template
thread which wrote a `0` into `origTid` used later to indicate that the
given thread should be skipped for a `clone(2)` call.
A `sem_post()` was getting mixed up between two multithreaded_fork()
calls (one immediately following the other). Essentially, a child thread
in the template process appeared to not have exited its
SIG_MULTITHREADED_FORK handler before the second multithreaded_fork()
call came in. The child thread would have been calling `sem_wait()`.
We suspect the bug occurs because we receive a signal inside the signal
handler while calling `sem_wait()`, which is known not to be
async-signal-safe. A workaround is to use the self-pipe trick instead of
sem_wait. For now, we add an additional sem_post/sem_wait such that a
signal sent via tgkill during the next multithreaded_fork call can only
occur when a child thread is calling `sem_post()`, which is
async-signal-safe.
`sem_wait()` isn't async-signal-safe, so calling it within a signal
handler may cause unexpected bugs. Such an unexpected bug was observed
in prior commits: receiving a signal when calling `sem_wait` inside of
the signal handler caused unexpected behavior.

Fortunately, we can avoid calling `sem_wait()` in the handler. The key
idea is that the parent thread calling `multithreaded_fork()` is only
waiting for the child threads to call `getcontext()`. Once this happens,
the parent thread can simply call `fork()`. The cloned parent thread in
the child process will use the contexts saved from before to restore the
userspace threads. Consequently, the `sem_wait()` can be removed.

Even better would be to use the deep debugging system along
`multithreaded_fork()` and avoid the signalling entirely. This is the
work of subsequent commits
The existing `multithreaded_fork()` implementation does not assume a
particular location in which it will be called: userspace threads are
assumed to be possibly executing in different locations in the code.
However, in deep debugging, userspace threads will always be executing
in a known location: `thread_handle_after_dmtcp_restart()`. Using this,
we can amortize the cost of getcontext() and writing the pthread
descriptors into a global storage at the "getting into a stable state
phase". The idea is that the context is all that really matters to
successfully execute multithreaded_fork().

With this knowledge in mind, the `fast_multithreaded_fork()` is
introduced which simply reads from global memory the appropriate
contexts inside the child process. Otherwise, it is conceptually very
close to fork().
Raw `pthread_t` structs were passed in shared memory between the McMini
process and the branch processes after multithreaded_fork(). However,
this is suspect because although the `pthread_t` struct is patched
between `multithreaded_fork()` calls, it may not be exactly "equal"
copy-wise. Instead, we now keep track of a light-weight mapping of
pthread_t to runner IDs. The higher-level goal is to be able to track
objects persistently across traces despite the fact that these objects
will appear in (hundreds) of traces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants