Skip to content

Commit e560fa7

Browse files
committed
fix(gateway): synchronize with worker before notify in ResourceChangeNotifier::shutdown
Classical lost-wakeup race between shutdown() and worker_loop(): 1. worker locks queue_mutex_ 2. worker checks predicate (flag=false, queue empty) -> false 3. shutdown() CAS flag -> true (NOT holding queue_mutex_) 4. shutdown() notify_one() - worker NOT yet enqueued on CV 5. worker enters wait(lock): atomic unlock+enqueue+sleep 6. worker sleeps forever; main blocks in worker_thread_.join() Even though shutdown_flag_ uses seq_cst atomics, the worker's predicate check and entry into wait() are not atomic with respect to the flag modification. The notify can arrive before the worker is enqueued. Fix: briefly acquire queue_mutex_ between setting the flag and notifying. This guarantees the worker is either still outside its critical section (will observe the new flag on the next lock) or already enqueued on the CV (notify_one will wake it). Manifested as a TSan-specific hang in DoubleShutdownIsSafe (worker spawn + immediate shutdown hits the race window).
1 parent f00a156 commit e560fa7

2 files changed

Lines changed: 8 additions & 1 deletion

File tree

src/ros2_medkit_gateway/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -655,7 +655,7 @@ if(BUILD_TESTING)
655655
target_link_libraries(test_condition_evaluator gateway_lib)
656656

657657
# Resource change notifier tests (async notification hub for triggers)
658-
ament_add_gtest(test_resource_change_notifier test/test_resource_change_notifier.cpp TIMEOUT 600)
658+
ament_add_gtest(test_resource_change_notifier test/test_resource_change_notifier.cpp TIMEOUT 180)
659659
target_link_libraries(test_resource_change_notifier gateway_lib)
660660

661661
# Trigger store tests (SQLite persistence for triggers)

src/ros2_medkit_gateway/src/resource_change_notifier.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,13 @@ void ResourceChangeNotifier::shutdown() {
105105
});
106106
}
107107

108+
// Synchronize with worker_loop()'s predicate check. Without this, the flag
109+
// store above can land between the worker's predicate evaluation and its
110+
// wait() call, losing the subsequent notify_one(). Acquiring queue_mutex_
111+
// here guarantees the worker is either still outside the critical section
112+
// (will observe the new flag) or already enqueued on queue_cv_ (notify will
113+
// wake it).
114+
{ std::lock_guard<std::mutex> sync(queue_mutex_); }
108115
queue_cv_.notify_one();
109116
if (worker_thread_.joinable()) {
110117
worker_thread_.join();

0 commit comments

Comments
 (0)