Skip to content

Incorrect tasks termination for blocking data structures (nonlinear_mutex example) on master branch #13

@dmitrii-artuhov

Description

@dmitrii-artuhov

Problem

I observed that in master branch the tasks termination is handled incorrectly. Let's consider how TerminateTasks implemented:

  • state.Reset() is called before the tasks termination, which does not seem right both for blocking and non-blocking data structures. Since the tasks could be stuck in mid-way through their source code. And if we first reset the data structure under test and then try to resume such mid-way stuck tasks, that could lead to UB (which must have been observed here in PR with minimization). So the correct way seems to do that after the cycle where task_i->Terminate() is called, as shown on the picture below (screenshot from master branch with commented out wrong place with state.Restore()).
    Image
  • However, when running such somewhat corrected code with nonlinear mutex example LD_PRELOAD=build/syscall_intercept/libpreload.so ./build/verifying/blocking/nonlinear_mutex --strategy random --verbose, the following situation appears:
    threads  = 2
    tasks    = 15
    switches = 100000000
    rounds   = 5
    targets  = 2
    strategy = random
    
    
    run round: 0
    ==89519==WARNING: ASan is ignoring requested __asan_handle_no_return: stack type: default top: 0x7ffffffcc000; bottom 0x7ffff2bfe000; size: 0x00000d3ce000 (222093312)
    False positive error reports may follow
    For details see https://github.com/google/sanitizers/issues/189
    *--------------------*--------------------*
    |         T0         |         T1         |
    *--------------------*--------------------*
    | Lock()             |                    |
    |                    | Lock()             |
    | <-- 0              |                    |
    | Unlock()           |                    |
    | <-- 0              |                    |
    | Lock()             |                    |
    |                    | <-- 0              |
    |                    | Unlock()           |
    |                    | <-- 0              |
    |                    | Lock()             |
    | <-- 0              |                    |
    | Unlock()           |                    |
    | <-- 0              |                    |
    | Lock()             |                    |
    |                    | <-- 0              |
    |                    | Unlock()           |
    | <-- 0              |                    |
    | Unlock()           |                    |
    |                    | <-- 0              |
    |                    | Lock()             |
    |                    | <-- 0              |
    | <-- 0              |                    |
    |                    | Unlock()           |
    | Lock()             |                    |
    |                    | <-- 0              |
    |                    | Lock()             |
    | <-- 0              |                    |
    | Unlock()           |                    |
    | <-- 0              |                    |
    | Lock()             |                    |
    |                    | <-- 0              |
    *--------------------*--------------------*
    ===============================================
    
    Terminating task blocked=0
    ... hangs infinitely ...
    The last task from thread 1 which is Lock is marked as non blocked and is attempted to be terminated, which it obviously cannot, because of the second thread seems to obtain the mutex. However, it is strange to see the tool reporting last task in thread 1 as non blocked. Probably, this is a derivation from the fact that here, the nonlinear mutex is tested. But we still don't want to see the whole execution being stuck instead of nonlinearizability report being printed.

Expected

  • Should we even do the termination that we do it right now? We should be able to just destroy the coro instances and never think about the order and possibility of tasks termination

CC @lim123123123 @Kirillog

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions