8010: runtime: optional eager I/O driver/timer handoff when polling tasks by martin-augment · Pull Request #92 · martin-augment/tokio

martin-augment · 2026-04-04T11:42:43Z

8010: To review by AI

This way, it should be less susceptible to some platforms having slow enough timers that the two tasks are woken in the same iteration of the timer wheel, or whatever it is that broke CI.

coderabbitai · 2026-04-04T11:42:59Z

Walkthrough

This change introduces eager driver handoff functionality to Tokio's multi-threaded runtime. A new enable_eager_driver_handoff configuration flag is added to the Builder and Config structures, exposed via an unstable builder method. The parking mechanism is refactored to track whether a worker parked on the I/O driver via a new HadDriver enum. Worker scheduling logic is updated to record driver participation state and, when the feature is enabled, attempt to notify parked local workers before polling tasks. A new test file demonstrates the feature's ability to prevent runtime deadlocks by changing worker notification behavior during driver-related transitions.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pr-8010-2026-04-04-11-42-40

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces an experimental 'eager driver handoff' feature for multi-threaded runtimes, accessible via the tokio_unstable flag. The feature allows a worker thread that has been polling the I/O or time driver to notify another worker to take over driver duties before it begins polling a task, thereby reducing latency and mitigating potential deadlocks. The implementation involves tracking driver ownership during parking and updating the worker's execution loop to trigger notifications. Review feedback suggests a logic refinement in the worker loop to ensure the driver ownership state is reset even when a notification was already sent during a searching transition, preventing redundant wakeups for subsequent tasks.

gemini-code-assist · 2026-04-04T11:43:52Z

tokio/src/runtime/scheduler/multi_thread/worker.rs

+        if cfg!(tokio_unstable)
+            && core.enable_eager_driver_handoff
+            && core.had_driver == park::HadDriver::Yes
+            && !notified_parked_worker
+        // don't do it a second time
+        {
+            core.had_driver = park::HadDriver::No;
+            self.worker.handle.notify_parked_local();
+        }


The had_driver state should be reset to No even if notified_parked_worker is true. If a worker transitions from searching and notifies another worker, it has already fulfilled the "handoff" requirement. If had_driver is not reset in this case, the worker might trigger a redundant notification when polling the next task in its queue, as notified_parked_worker will be false for subsequent tasks while had_driver remains Yes.

Suggested change

if cfg!(tokio_unstable)

&& core.enable_eager_driver_handoff

&& core.had_driver == park::HadDriver::Yes

&& !notified_parked_worker

// don't do it a second time

{

core.had_driver = park::HadDriver::No;

self.worker.handle.notify_parked_local();

}

if cfg!(tokio_unstable)

&& core.enable_eager_driver_handoff

&& core.had_driver == park::HadDriver::Yes

{

core.had_driver = park::HadDriver::No;

if !notified_parked_worker {

self.worker.handle.notify_parked_local();

}

}

claude · 2026-04-04T11:45:58Z

PR Review: `runtime: optional eager I/O driver/timer handoff when polling tasks`

This is a well-motivated feature that addresses a real class of deadlocks (#4730). The overall design is sound — deferring the handoff until task-poll time (rather than on unpark) avoids spurious wakeups in the common case. Below are findings ranging from a correctness bug to minor nits.

Bug: `had_driver` not reset when `notified_parked_worker == true`

File: tokio/src/runtime/scheduler/multi_thread/worker.rs, run_task (~line 645)

The current logic:

let notified_parked_worker = core.transition_from_searching(&self.worker);

if cfg!(tokio_unstable)
    && core.enable_eager_driver_handoff
    && core.had_driver == park::HadDriver::Yes
    && !notified_parked_worker  // don't do it a second time
{
    core.had_driver = park::HadDriver::No;
    self.worker.handle.notify_parked_local();
}

core.had_driver is only reset inside the if block. When notified_parked_worker == true (i.e. transition_from_searching already woke a worker), the eager handoff is correctly skipped — but had_driver is left as Yes.

On the next call to run_task (e.g. the worker drains its local queue), is_searching is already false, so transition_from_searching returns false immediately, making notified_parked_worker = false. Now all conditions are true and the eager handoff fires spuriously, potentially waking an idle worker for no reason. This can cascade on queues with many tasks.

Suggested fix — always reset had_driver, and only conditionally notify:

let notified_parked_worker = core.transition_from_searching(&self.worker);

if cfg!(tokio_unstable)
    && core.enable_eager_driver_handoff
    && core.had_driver == park::HadDriver::Yes
{
    core.had_driver = park::HadDriver::No; // reset unconditionally
    if !notified_parked_worker {
        self.worker.handle.notify_parked_local();
    }
}

Inconsistency: `enable_eager_driver_handoff` not gated with `#[cfg(tokio_unstable)]`

Files: tokio/src/runtime/config.rs and worker.rs (Core struct)

Every other unstable field in Config uses #[cfg(tokio_unstable)]:

#[cfg(tokio_unstable)]
pub(crate) unhandled_panic: crate::runtime::UnhandledPanic,

But enable_eager_driver_handoff is unconditionally present in both Config and Core. The cfg!(tokio_unstable) macro check in run_task does compile away the logic, but the fields themselves remain in stable builds — wasting space and creating inconsistency.

The comment in worker.rs acknowledges the intent ("should result in this whole expression being eliminated at compile-time"), but the fields behind it are still allocated. Wrapping them in #[cfg(tokio_unstable)] would be cleaner and consistent with the rest of the codebase.

Test reliability: `deadlocks_consistently` may be flaky

File: tokio/tests/rt_unstable_eager_driver_handoff.rs

This test asserts that the deadlock scenario reliably reproduces within 10 seconds. The 100 ms sleep (tokio::time::sleep(Duration::from_millis(100)).await) is intended to ensure bad_task is scheduled on the driver-holding worker before pipe1 is written to, but this is a timing assumption.

On a heavily loaded CI machine or a fast bare-metal machine, the scheduling order may differ. If the test ever fails (i.e. the runtime does NOT deadlock without the fix), CI will incorrectly flag that as a failure of the guard test. Consider adding a comment acknowledging this inherent raciness and/or running the scenario multiple times to increase confidence.

Typo in test comment

File: tokio/tests/rt_unstable_eager_driver_handoff.rs, in do_test

// thread that polled the I/0 driver. Now, block this

I/0 (zero) should be I/O (letter O).

Minor: `HadDriver` should derive `Debug`

File: tokio/src/runtime/scheduler/multi_thread/park.rs

#[derive(Copy, Clone, Eq, PartialEq)]
pub(crate) enum HadDriver { Yes, No }

Adding Debug would make it much easier to inspect Core state during debugging or tracing, and is essentially free.

Nit: `notify_parked_local()` return value discarded in eager handoff path

In run_task (line 652), after the eager handoff fires:

self.worker.handle.notify_parked_local();

The bool return value is discarded. That's intentional here (if there's no idle worker to notify, that's fine). A let _ = or a comment explaining the discard would make it clear this isn't an oversight, especially since the function was specifically changed in this PR to return bool.

Summary

Severity	Issue
Bug	`had_driver` not reset when handoff skipped due to `notified_parked_worker == true` → spurious extra wakeups on subsequent task polls
Inconsistency	`enable_eager_driver_handoff` / `had_driver` fields not `#[cfg(tokio_unstable)]`-gated unlike all other unstable fields
Test concern	`deadlocks_consistently` test depends on scheduler timing and may be flaky
Minor	Typo `I/0` → `I/O`
Minor	`HadDriver` missing `Debug` derive
Nit	Discarded `bool` from `notify_parked_local()` should be `let _ =` or explained

The core mechanism and API design look good. The primary thing to fix before merging is the had_driver reset bug.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e277211. Configure here.}

cursor · 2026-04-04T11:47:03Z

tokio/src/runtime/scheduler/multi_thread/worker.rs

+        {
+            core.had_driver = park::HadDriver::No;
+            self.worker.handle.notify_parked_local();
+        }


had_driver Not Reset When Searching Transition Notifies

Medium Severity

When transition_from_searching already notified a parked worker (notified_parked_worker == true), the eager-handoff block is skipped — but core.had_driver is never reset to HadDriver::No. On the very next call to run_task for the following task, is_searching is already false, so transition_from_searching returns false, making !notified_parked_worker true, and the condition fires an extra spurious notify_parked_local(). The had_driver reset should happen unconditionally whenever had_driver == HadDriver::Yes, with the additional notify_parked_local() call guarded only by !notified_parked_worker.

^{Reviewed by Cursor Bugbot for commit e277211. Configure here.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tokio/src/runtime/scheduler/multi_thread/worker.rs`:
- Around line 630-653: transition_from_searching() can return true meaning the
post-driver state was already consumed, but core.had_driver remains Yes and can
cause a second notify later; change the logic so that when cfg!(tokio_unstable)
&& core.enable_eager_driver_handoff && core.had_driver == park::HadDriver::Yes
you always reset core.had_driver = park::HadDriver::No immediately, and only
call self.worker.handle.notify_parked_local() conditionally when
notified_parked_worker is false. In other words, move or duplicate the
core.had_driver = No assignment out of the inner conditional so the had_driver
flag is cleared on the first post-park task, and keep notify_parked_local
guarded by !notified_parked_worker; refer to transition_from_searching,
core.had_driver, core.enable_eager_driver_handoff, notified_parked_worker, and
self.worker.handle.notify_parked_local.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aa75a05b-c59e-4e48-9181-cdf86be63e45

📥 Commits

Reviewing files that changed from the base of the PR and between 80a0b97 and e277211.

📒 Files selected for processing (5)

tokio/src/runtime/builder.rs
tokio/src/runtime/config.rs
tokio/src/runtime/scheduler/multi_thread/park.rs
tokio/src/runtime/scheduler/multi_thread/worker.rs
tokio/tests/rt_unstable_eager_driver_handoff.rs

coderabbitai · 2026-04-04T11:53:46Z

tokio/src/runtime/scheduler/multi_thread/worker.rs

+        let notified_parked_worker = core.transition_from_searching(&self.worker);
+
+        // If the setting to wake eagerly when releasing the I/O driver is
+        // enabled, and this worker had the driver, wake a parked worker to come
+        // grab it from us.
+        //
+        // Note that this is only done when we are *actually* about to poll a
+        // task, rather than whenever the worker has unparked. When the worker
+        // has been unparked, it may not actually have any tasks to poll, and if
+        // it's still holding the I/O driver, it should just go back to polling
+        // the driver again, rather than trying to wake someone else spuriously.
+        //
+        // Note that this explicitly checks `cfg!(tokio_unstable)` in addition,
+        // as that should result in this whole expression being eliminated at
+        // compile-time when unstable features are disabled.
+        if cfg!(tokio_unstable)
+            && core.enable_eager_driver_handoff
+            && core.had_driver == park::HadDriver::Yes
+            && !notified_parked_worker
+        // don't do it a second time
+        {
+            core.had_driver = park::HadDriver::No;
+            self.worker.handle.notify_parked_local();
+        }


⚠️ Potential issue | 🟠 Major

Reset had_driver on the first post-park task, even when searching already woke someone.

If transition_from_searching() returns true, this first task already consumed the post-driver state, but core.had_driver stays Yes. The next run_task can then call notify_parked_local() again from the same park cycle, which defeats the intended one-shot handoff and adds extra wakeups.

💡 Suggested fix

let notified_parked_worker = core.transition_from_searching(&self.worker); + let had_driver = core.had_driver == park::HadDriver::Yes; + core.had_driver = park::HadDriver::No; // If the setting to wake eagerly when releasing the I/O driver is // enabled, and this worker had the driver, wake a parked worker to come // grab it from us. @@ if cfg!(tokio_unstable) && core.enable_eager_driver_handoff - && core.had_driver == park::HadDriver::Yes + && had_driver && !notified_parked_worker - // don't do it a second time { - core.had_driver = park::HadDriver::No; self.worker.handle.notify_parked_local(); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let notified_parked_worker = core.transition_from_searching(&self.worker);

// If the setting to wake eagerly when releasing the I/O driver is

// enabled, and this worker had the driver, wake a parked worker to come

// grab it from us.

//

// Note that this is only done when we are *actually* about to poll a

// task, rather than whenever the worker has unparked. When the worker

// has been unparked, it may not actually have any tasks to poll, and if

// it's still holding the I/O driver, it should just go back to polling

// the driver again, rather than trying to wake someone else spuriously.

//

// Note that this explicitly checks `cfg!(tokio_unstable)` in addition,

// as that should result in this whole expression being eliminated at

// compile-time when unstable features are disabled.

if cfg!(tokio_unstable)

&& core.enable_eager_driver_handoff

&& core.had_driver == park::HadDriver::Yes

&& !notified_parked_worker

// don't do it a second time

{

core.had_driver = park::HadDriver::No;

self.worker.handle.notify_parked_local();

}

let notified_parked_worker = core.transition_from_searching(&self.worker);

let had_driver = core.had_driver == park::HadDriver::Yes;

core.had_driver = park::HadDriver::No;

// If the setting to wake eagerly when releasing the I/O driver is

// enabled, and this worker had the driver, wake a parked worker to come

// grab it from us.

//

// Note that this is only done when we are *actually* about to poll a

// task, rather than whenever the worker has unparked. When the worker

// has been unparked, it may not actually have any tasks to poll, and if

// it's still holding the I/O driver, it should just go back to polling

// the driver again, rather than trying to wake someone else spuriously.

//

// Note that this explicitly checks `cfg!(tokio_unstable)` in addition,

// as that should result in this whole expression being eliminated at

// compile-time when unstable features are disabled.

if cfg!(tokio_unstable)

&& core.enable_eager_driver_handoff

&& had_driver

&& !notified_parked_worker

{

self.worker.handle.notify_parked_local();

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tokio/src/runtime/scheduler/multi_thread/worker.rs` around lines 630 - 653, transition_from_searching() can return true meaning the post-driver state was already consumed, but core.had_driver remains Yes and can cause a second notify later; change the logic so that when cfg!(tokio_unstable) && core.enable_eager_driver_handoff && core.had_driver == park::HadDriver::Yes you always reset core.had_driver = park::HadDriver::No immediately, and only call self.worker.handle.notify_parked_local() conditionally when notified_parked_worker is false. In other words, move or duplicate the core.had_driver = No assignment out of the inner conditional so the had_driver flag is cleared on the first post-park task, and keep notify_parked_local guarded by !notified_parked_worker; refer to transition_from_searching, core.had_driver, core.enable_eager_driver_handoff, notified_parked_worker, and self.worker.handle.notify_parked_local.

augmentcode · 2026-04-06T20:15:23Z

🤖 Augment PR Summary

Summary: This PR adds an (unstable) option to eagerly hand off polling of the I/O/time driver when a worker transitions from driver polling back to task polling.

Changes:

Added a new builder/config flag enable_eager_driver_handoff (default off) behind tokio_unstable + rt-multi-thread.
Plumbed the flag through runtime Config and into the multi-thread scheduler worker core.
Extended the multi-thread parker to report whether the last park used the shared driver vs a condvar (HadDriver).
When enabled, a worker that just returned from a driver park may unpark another idle worker right before polling its next task, so the other worker can start polling the driver.
Adjusted searching/notification helpers to return whether a parked worker was actually notified, to avoid redundant wakeups.
Added a Unix-only regression test that reproduces a driver/task deadlock without the flag and verifies it is avoided when the flag is enabled.

Technical Notes: The eager handoff logic is guarded so it compiles away when unstable features are disabled, and it only affects multi-thread runtimes.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-04-06T20:15:24Z

tokio/tests/rt_unstable_eager_driver_handoff.rs

+    // whether or not the runtime has deadlocked.
+    let (done_tx, done_rx) = std::sync::mpsc::channel();
+
+    std::thread::spawn(move || {


tokio/tests/rt_unstable_eager_driver_handoff.rs:94: In the expected-deadlock case, this spawns a thread that never terminates (it remains blocked in rt.block_on), which can leak an OS thread for the remainder of the test binary and potentially interfere with other tests running in the same process.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-04-06T20:15:24Z

tokio/tests/rt_unstable_eager_driver_handoff.rs

+        done_tx.send(()).unwrap();
+    });
+
+    done_rx.recv_timeout(Duration::from_secs(10))


tokio/tests/rt_unstable_eager_driver_handoff.rs:138: deadlocks_consistently appears to pass only after waiting the full 10s recv_timeout, which will reliably add ~10s to the test suite on supported platforms; is that duration intentional for CI?

Severity: low

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

hawkw added 12 commits April 3, 2026 11:39

runtime: a surprise tool that will help us later

054a5eb

tidy up

066bfeb

tidy up

2c80455

tidy up

e95f046

don't repeat notifications

224f9dc

add a test

a5e2bb7

nicer docs

d77de00

apparently "handoff" is "not a word"

8749f32

try to make the timer difference wider to see if that fixes the arm test

568d4b0

rewrite the test to use pipes

87257ae

This way, it should be less susceptible to some platforms having slow enough timers that the two tasks are woken in the same iteration of the timer wheel, or whatever it is that broke CI.

oh shut up clippy

65de590

oh shut up miri

e277211

gemini-code-assist bot reviewed Apr 4, 2026

View reviewed changes

cursor bot reviewed Apr 4, 2026

View reviewed changes

coderabbitai bot reviewed Apr 4, 2026

View reviewed changes

augmentcode bot reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8010: runtime: optional eager I/O driver/timer handoff when polling tasks#92

8010: runtime: optional eager I/O driver/timer handoff when polling tasks#92
martin-augment wants to merge 12 commits intomasterfrom
pr-8010-2026-04-04-11-42-40

martin-augment commented Apr 4, 2026

Uh oh!

coderabbitai bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 4, 2026

Uh oh!

claude bot commented Apr 4, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 4, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 4, 2026

Uh oh!

augmentcode bot commented Apr 6, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

augmentcode bot Apr 6, 2026

Uh oh!

augmentcode bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martin-augment commented Apr 4, 2026

Uh oh!

coderabbitai bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Apr 4, 2026

PR Review: runtime: optional eager I/O driver/timer handoff when polling tasks

Bug: had_driver not reset when notified_parked_worker == true

Inconsistency: enable_eager_driver_handoff not gated with #[cfg(tokio_unstable)]

Test reliability: deadlocks_consistently may be flaky

Typo in test comment

Minor: HadDriver should derive Debug

Nit: notify_parked_local() return value discarded in eager handoff path

Summary

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 4, 2026

Choose a reason for hiding this comment

had_driver Not Reset When Searching Transition Notifies

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Apr 6, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Apr 4, 2026 •

edited

Loading

PR Review: `runtime: optional eager I/O driver/timer handoff when polling tasks`

Bug: `had_driver` not reset when `notified_parked_worker == true`

Inconsistency: `enable_eager_driver_handoff` not gated with `#[cfg(tokio_unstable)]`

Test reliability: `deadlocks_consistently` may be flaky

Minor: `HadDriver` should derive `Debug`

Nit: `notify_parked_local()` return value discarded in eager handoff path

`had_driver` Not Reset When Searching Transition Notifies