Skip to content

7922: runtime: fix race condition during the blocking pool shutdown#67

Open
martin-augment wants to merge 1 commit intomasterfrom
pr-7922-2026-02-20-09-04-22
Open

7922: runtime: fix race condition during the blocking pool shutdown#67
martin-augment wants to merge 1 commit intomasterfrom
pr-7922-2026-02-20-09-04-22

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

7922: To review by AI

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a race condition within the Tokio runtime's blocking pool shutdown mechanism. By introducing a precise tracking flag for idle threads, it ensures that the num_idle_threads metric is always correctly maintained, thereby preventing potential underflow errors and enhancing the robustness of the runtime's resource management during termination.

Highlights

  • Race Condition Fix: Addressed a race condition that could lead to num_idle_threads underflowing during the blocking pool shutdown process.
  • Idle Thread Tracking: Introduced a new is_counted_idle flag to accurately track whether a thread is currently included in the num_idle_threads metric.
  • Conditional Metric Decrement: Modified the thread exit logic to conditionally decrement num_idle_threads only if the thread was previously marked as idle, preventing incorrect metric updates.
Changelog
  • tokio/src/runtime/blocking/pool.rs
    • Declared a new mutable boolean variable is_counted_idle to track if the current thread is counted in num_idle_threads.
    • Set is_counted_idle to true when a thread becomes idle and its count is incremented.
    • Set is_counted_idle to false when a thread is legitimately woken up, as Spawner::spawn_task would have already decremented the idle count.
    • Conditionally decremented num_idle_threads and asserted its value only if the thread was previously marked as idle, preventing underflow.
Activity
  • The pull request was submitted for review by AI.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 20, 2026

Walkthrough

The changes modify thread idle tracking in the blocking thread pool runtime component. A per-thread is_counted_idle flag is introduced to track whether a thread is counted in the idle thread counter. The flag is set when entering the IDLE state and cleared upon legitimate wakeup notifications. On thread exit, the idle counter is adjusted only if the flag indicates the thread was previously counted as idle. This ensures the idle thread counter remains accurate and consistent with actual thread states.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-7922-2026-02-20-09-04-22

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The changes effectively address a race condition during the blocking pool shutdown by introducing an is_counted_idle flag. This flag accurately tracks whether a thread is currently contributing to the num_idle_threads metric, preventing incorrect decrements and potential underflow panics. The logic is sound and directly resolves the issue described in the pull request title, enhancing the robustness of the blocking pool.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Feb 20, 2026

🤖 Augment PR Summary

Summary: Fixes a race in the blocking pool shutdown path where worker exit could double-decrement the idle-thread metric.
Changes: Tracks whether a worker is currently counted as idle and only decrements num_idle_threads on thread exit when that’s true.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

@claude
Copy link
Copy Markdown

claude bot commented Feb 20, 2026

Code Review: Fix Race Condition During Blocking Pool Shutdown

Summary

This PR fixes a real race condition in the blocking thread pool where num_idle_threads could be double-decremented, leading to an integer underflow and a panic from the assert_ne\! guard.

Root Cause Analysis

The race arises from this interleaved sequence:

  1. A blocking thread is idle; num_idle_threads = 1
  2. spawn_task acquires the lock, reads num_idle_threads > 0, pre-decrements it to 0, sets num_notify = 1, calls condvar.notify_one()
  3. Concurrently, shutdown() acquires the lock, sets shared.shutdown = true, calls condvar.notify_all()
  4. The idle thread wakes from condvar.wait_timeout, re-acquires the lock, sees num_notify \!= 0, sets is_counted_idle = false, breaks the inner while, and transitions to shutdown draining
  5. The original code then unconditionally calls dec_num_idle_threads() at thread exit — but spawn_task already decremented it in step 2, causing a double-decrement

The concrete panic: MetricAtomicUsize::decrement() uses fetch_sub(1, Relaxed), which returns the previous value. When num_idle_threads is already 0, decrement() returns 0, triggering assert_ne\!(0, 0, "num_idle_threads underflowed on thread exit").

Correctness of the Fix

The fix is correct. The is_counted_idle flag accurately tracks the invariant: "is this thread's idle slot still outstanding in num_idle_threads?"

Key path analysis:

  • Set to true immediately after inc_num_idle_threads() — always paired
  • Set to false only when num_notify \!= 0 — meaning spawn_task already called dec_num_idle_threads()
  • Timeout path (break 'main): exits with is_counted_idle = true → decrement correct
  • Shutdown path after legitimate wakeup: exits with is_counted_idle = false → no decrement (correct, spawn_task already did it)
  • Shutdown path without prior wakeup: exits with is_counted_idle = true → decrement correct

Rust's definite-initialization analysis guarantees is_counted_idle is always set before its use at exit, since the IDLE section (where it is set to true) is always visited before any exit from 'main.

Suggestions

1. Missing loom test (important)

The existing blocking_shutdown loom test in loom_blocking.rs does not cover the specific scenario where a spawn_task notification races with shutdown. A dedicated test would lock in this fix and prevent future regressions. Something along these lines:

#[test]
fn spawn_task_and_shutdown_race() {
    loom::model(|| {
        let rt = mk_runtime(1);
        {
            let _enter = rt.enter();
            // Spawn so a thread goes idle, then spawn again to trigger the
            // pre-decrement of num_idle_threads racing with shutdown.
            let _ = crate::task::spawn_blocking(|| {});
        }
        // Dropping exercises the spawn_task + shutdown race.
        drop(rt);
    });
}

Loom would exhaustively explore the interleaving where spawn_task pre-decrements num_idle_threads and shutdown fires simultaneously — exactly the scenario this PR addresses.

2. Minor — uninitialized declaration

let mut is_counted_idle; without an initial value is valid Rust (the compiler verifies definite initialization), but readers skimming the function may find it surprising. Initializing to false at the declaration site would make the control flow slightly clearer, though either is correct.

3. Minor — duplicate comment

The comment // is this thread currently counted in num_idle_threads? appears at both the declaration and the exit check. The duplication is harmless and actually useful for orientation — this is a style nit only.

Overall Assessment

The fix is minimal, well-targeted, and correct. The main gap is the absence of a loom regression test for this specific race. Given that the blocking pool already has loom tests in loom_blocking.rs, adding one here would be consistent with project conventions and provide strong guarantees that this bug cannot regress.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tokio/src/runtime/blocking/pool.rs (2)

502-603: Add a loom test for the spawn_task + shutdown race condition

The race condition fixed here is subtle: when spawn_task pre-decrements num_idle_threads but shutdown fires before the worker thread wakes up and sets is_counted_idle = false, the thread exits with is_counted_idle = true despite num_idle_threads already being decremented—leading to a double-decrement. This is exactly what loom testing is designed to catch. Since no loom tests currently exist for the blocking pool, adding one would make this fix self-documenting and prevent regression.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tokio/src/runtime/blocking/pool.rs` around lines 502 - 603, Add a loom test
that reproduces the spawn_task + shutdown race where spawn_task pre-decrements
num_idle_threads and shutdown wins before the worker thread wakes, causing a
double-decrement when Inner::run exits with is_counted_idle == true; implement
the test to (1) use loom's threading/scheduling primitives to create a worker
thread running Inner::run (or exercise the public spawning API that calls
spawn_task) and a spawner that calls spawn_task and then triggers shutdown, (2)
interleave operations so spawn_task decrements num_idle_threads before the
worker observes the wakeup and sets is_counted_idle = false, and (3) assert via
loom that no underflow/double-decrement occurs (e.g., num_idle_threads stays
valid and no panic), referencing Inner::run, spawn_task, num_idle_threads,
is_counted_idle, and shared.shutdown so the test targets the exact race.

510-511: Remove redundant comment at line 510

The comment "is this thread currently counted in num_idle_threads?" is repeated at line 578 (before the read). Remove the declaration-site comment since the reading-site comment provides sufficient context.

The uninitialized declaration is correct: Rust's definite-assignment analysis verifies that all execution paths reaching the read at line 579 assign is_counted_idle beforehand. This is stricter than a default initializer—future code introducing uninitialized reads will be caught at compile time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tokio/src/runtime/blocking/pool.rs` around lines 510 - 511, Remove the
redundant declaration-site comment for the local variable `is_counted_idle` in
the `pool.rs` blocking pool code: keep the comment that explains its meaning at
the read site (before the read at the later location) and delete the earlier
comment that duplicates "is this thread currently counted in
`num_idle_threads`?" placed above the `let mut is_counted_idle;` declaration; do
not change the uninitialized declaration itself—leave `let mut is_counted_idle;`
as-is so Rust's definite-assignment checks remain in place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tokio/src/runtime/blocking/pool.rs`:
- Around line 502-603: Add a loom test that reproduces the spawn_task + shutdown
race where spawn_task pre-decrements num_idle_threads and shutdown wins before
the worker thread wakes, causing a double-decrement when Inner::run exits with
is_counted_idle == true; implement the test to (1) use loom's
threading/scheduling primitives to create a worker thread running Inner::run (or
exercise the public spawning API that calls spawn_task) and a spawner that calls
spawn_task and then triggers shutdown, (2) interleave operations so spawn_task
decrements num_idle_threads before the worker observes the wakeup and sets
is_counted_idle = false, and (3) assert via loom that no
underflow/double-decrement occurs (e.g., num_idle_threads stays valid and no
panic), referencing Inner::run, spawn_task, num_idle_threads, is_counted_idle,
and shared.shutdown so the test targets the exact race.
- Around line 510-511: Remove the redundant declaration-site comment for the
local variable `is_counted_idle` in the `pool.rs` blocking pool code: keep the
comment that explains its meaning at the read site (before the read at the later
location) and delete the earlier comment that duplicates "is this thread
currently counted in `num_idle_threads`?" placed above the `let mut
is_counted_idle;` declaration; do not change the uninitialized declaration
itself—leave `let mut is_counted_idle;` as-is so Rust's definite-assignment
checks remain in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants