8010: runtime: optional eager I/O driver/timer handoff when polling tasks#92
8010: runtime: optional eager I/O driver/timer handoff when polling tasks#92martin-augment wants to merge 12 commits intomasterfrom
Conversation
This way, it should be less susceptible to some platforms having slow enough timers that the two tasks are woken in the same iteration of the timer wheel, or whatever it is that broke CI.
WalkthroughThis change introduces eager driver handoff functionality to Tokio's multi-threaded runtime. A new ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental 'eager driver handoff' feature for multi-threaded runtimes, accessible via the tokio_unstable flag. The feature allows a worker thread that has been polling the I/O or time driver to notify another worker to take over driver duties before it begins polling a task, thereby reducing latency and mitigating potential deadlocks. The implementation involves tracking driver ownership during parking and updating the worker's execution loop to trigger notifications. Review feedback suggests a logic refinement in the worker loop to ensure the driver ownership state is reset even when a notification was already sent during a searching transition, preventing redundant wakeups for subsequent tasks.
| if cfg!(tokio_unstable) | ||
| && core.enable_eager_driver_handoff | ||
| && core.had_driver == park::HadDriver::Yes | ||
| && !notified_parked_worker | ||
| // don't do it a second time | ||
| { | ||
| core.had_driver = park::HadDriver::No; | ||
| self.worker.handle.notify_parked_local(); | ||
| } |
There was a problem hiding this comment.
The had_driver state should be reset to No even if notified_parked_worker is true. If a worker transitions from searching and notifies another worker, it has already fulfilled the "handoff" requirement. If had_driver is not reset in this case, the worker might trigger a redundant notification when polling the next task in its queue, as notified_parked_worker will be false for subsequent tasks while had_driver remains Yes.
| if cfg!(tokio_unstable) | |
| && core.enable_eager_driver_handoff | |
| && core.had_driver == park::HadDriver::Yes | |
| && !notified_parked_worker | |
| // don't do it a second time | |
| { | |
| core.had_driver = park::HadDriver::No; | |
| self.worker.handle.notify_parked_local(); | |
| } | |
| if cfg!(tokio_unstable) | |
| && core.enable_eager_driver_handoff | |
| && core.had_driver == park::HadDriver::Yes | |
| { | |
| core.had_driver = park::HadDriver::No; | |
| if !notified_parked_worker { | |
| self.worker.handle.notify_parked_local(); | |
| } | |
| } |
PR Review:
|
| Severity | Issue |
|---|---|
| Bug | had_driver not reset when handoff skipped due to notified_parked_worker == true → spurious extra wakeups on subsequent task polls |
| Inconsistency | enable_eager_driver_handoff / had_driver fields not #[cfg(tokio_unstable)]-gated unlike all other unstable fields |
| Test concern | deadlocks_consistently test depends on scheduler timing and may be flaky |
| Minor | Typo I/0 → I/O |
| Minor | HadDriver missing Debug derive |
| Nit | Discarded bool from notify_parked_local() should be let _ = or explained |
The core mechanism and API design look good. The primary thing to fix before merging is the had_driver reset bug.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e277211. Configure here.
| { | ||
| core.had_driver = park::HadDriver::No; | ||
| self.worker.handle.notify_parked_local(); | ||
| } |
There was a problem hiding this comment.
had_driver Not Reset When Searching Transition Notifies
Medium Severity
When transition_from_searching already notified a parked worker (notified_parked_worker == true), the eager-handoff block is skipped — but core.had_driver is never reset to HadDriver::No. On the very next call to run_task for the following task, is_searching is already false, so transition_from_searching returns false, making !notified_parked_worker true, and the condition fires an extra spurious notify_parked_local(). The had_driver reset should happen unconditionally whenever had_driver == HadDriver::Yes, with the additional notify_parked_local() call guarded only by !notified_parked_worker.
Reviewed by Cursor Bugbot for commit e277211. Configure here.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tokio/src/runtime/scheduler/multi_thread/worker.rs`:
- Around line 630-653: transition_from_searching() can return true meaning the
post-driver state was already consumed, but core.had_driver remains Yes and can
cause a second notify later; change the logic so that when cfg!(tokio_unstable)
&& core.enable_eager_driver_handoff && core.had_driver == park::HadDriver::Yes
you always reset core.had_driver = park::HadDriver::No immediately, and only
call self.worker.handle.notify_parked_local() conditionally when
notified_parked_worker is false. In other words, move or duplicate the
core.had_driver = No assignment out of the inner conditional so the had_driver
flag is cleared on the first post-park task, and keep notify_parked_local
guarded by !notified_parked_worker; refer to transition_from_searching,
core.had_driver, core.enable_eager_driver_handoff, notified_parked_worker, and
self.worker.handle.notify_parked_local.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: aa75a05b-c59e-4e48-9181-cdf86be63e45
📒 Files selected for processing (5)
tokio/src/runtime/builder.rstokio/src/runtime/config.rstokio/src/runtime/scheduler/multi_thread/park.rstokio/src/runtime/scheduler/multi_thread/worker.rstokio/tests/rt_unstable_eager_driver_handoff.rs
| let notified_parked_worker = core.transition_from_searching(&self.worker); | ||
|
|
||
| // If the setting to wake eagerly when releasing the I/O driver is | ||
| // enabled, and this worker had the driver, wake a parked worker to come | ||
| // grab it from us. | ||
| // | ||
| // Note that this is only done when we are *actually* about to poll a | ||
| // task, rather than whenever the worker has unparked. When the worker | ||
| // has been unparked, it may not actually have any tasks to poll, and if | ||
| // it's still holding the I/O driver, it should just go back to polling | ||
| // the driver again, rather than trying to wake someone else spuriously. | ||
| // | ||
| // Note that this explicitly checks `cfg!(tokio_unstable)` in addition, | ||
| // as that should result in this whole expression being eliminated at | ||
| // compile-time when unstable features are disabled. | ||
| if cfg!(tokio_unstable) | ||
| && core.enable_eager_driver_handoff | ||
| && core.had_driver == park::HadDriver::Yes | ||
| && !notified_parked_worker | ||
| // don't do it a second time | ||
| { | ||
| core.had_driver = park::HadDriver::No; | ||
| self.worker.handle.notify_parked_local(); | ||
| } |
There was a problem hiding this comment.
Reset had_driver on the first post-park task, even when searching already woke someone.
If transition_from_searching() returns true, this first task already consumed the post-driver state, but core.had_driver stays Yes. The next run_task can then call notify_parked_local() again from the same park cycle, which defeats the intended one-shot handoff and adds extra wakeups.
💡 Suggested fix
let notified_parked_worker = core.transition_from_searching(&self.worker);
+ let had_driver = core.had_driver == park::HadDriver::Yes;
+ core.had_driver = park::HadDriver::No;
// If the setting to wake eagerly when releasing the I/O driver is
// enabled, and this worker had the driver, wake a parked worker to come
// grab it from us.
@@
if cfg!(tokio_unstable)
&& core.enable_eager_driver_handoff
- && core.had_driver == park::HadDriver::Yes
+ && had_driver
&& !notified_parked_worker
- // don't do it a second time
{
- core.had_driver = park::HadDriver::No;
self.worker.handle.notify_parked_local();
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let notified_parked_worker = core.transition_from_searching(&self.worker); | |
| // If the setting to wake eagerly when releasing the I/O driver is | |
| // enabled, and this worker had the driver, wake a parked worker to come | |
| // grab it from us. | |
| // | |
| // Note that this is only done when we are *actually* about to poll a | |
| // task, rather than whenever the worker has unparked. When the worker | |
| // has been unparked, it may not actually have any tasks to poll, and if | |
| // it's still holding the I/O driver, it should just go back to polling | |
| // the driver again, rather than trying to wake someone else spuriously. | |
| // | |
| // Note that this explicitly checks `cfg!(tokio_unstable)` in addition, | |
| // as that should result in this whole expression being eliminated at | |
| // compile-time when unstable features are disabled. | |
| if cfg!(tokio_unstable) | |
| && core.enable_eager_driver_handoff | |
| && core.had_driver == park::HadDriver::Yes | |
| && !notified_parked_worker | |
| // don't do it a second time | |
| { | |
| core.had_driver = park::HadDriver::No; | |
| self.worker.handle.notify_parked_local(); | |
| } | |
| let notified_parked_worker = core.transition_from_searching(&self.worker); | |
| let had_driver = core.had_driver == park::HadDriver::Yes; | |
| core.had_driver = park::HadDriver::No; | |
| // If the setting to wake eagerly when releasing the I/O driver is | |
| // enabled, and this worker had the driver, wake a parked worker to come | |
| // grab it from us. | |
| // | |
| // Note that this is only done when we are *actually* about to poll a | |
| // task, rather than whenever the worker has unparked. When the worker | |
| // has been unparked, it may not actually have any tasks to poll, and if | |
| // it's still holding the I/O driver, it should just go back to polling | |
| // the driver again, rather than trying to wake someone else spuriously. | |
| // | |
| // Note that this explicitly checks `cfg!(tokio_unstable)` in addition, | |
| // as that should result in this whole expression being eliminated at | |
| // compile-time when unstable features are disabled. | |
| if cfg!(tokio_unstable) | |
| && core.enable_eager_driver_handoff | |
| && had_driver | |
| && !notified_parked_worker | |
| { | |
| self.worker.handle.notify_parked_local(); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tokio/src/runtime/scheduler/multi_thread/worker.rs` around lines 630 - 653,
transition_from_searching() can return true meaning the post-driver state was
already consumed, but core.had_driver remains Yes and can cause a second notify
later; change the logic so that when cfg!(tokio_unstable) &&
core.enable_eager_driver_handoff && core.had_driver == park::HadDriver::Yes you
always reset core.had_driver = park::HadDriver::No immediately, and only call
self.worker.handle.notify_parked_local() conditionally when
notified_parked_worker is false. In other words, move or duplicate the
core.had_driver = No assignment out of the inner conditional so the had_driver
flag is cleared on the first post-park task, and keep notify_parked_local
guarded by !notified_parked_worker; refer to transition_from_searching,
core.had_driver, core.enable_eager_driver_handoff, notified_parked_worker, and
self.worker.handle.notify_parked_local.
🤖 Augment PR SummarySummary: This PR adds an (unstable) option to eagerly hand off polling of the I/O/time driver when a worker transitions from driver polling back to task polling. Changes:
Technical Notes: The eager handoff logic is guarded so it compiles away when unstable features are disabled, and it only affects multi-thread runtimes. 🤖 Was this summary useful? React with 👍 or 👎 |
| // whether or not the runtime has deadlocked. | ||
| let (done_tx, done_rx) = std::sync::mpsc::channel(); | ||
|
|
||
| std::thread::spawn(move || { |
There was a problem hiding this comment.
tokio/tests/rt_unstable_eager_driver_handoff.rs:94: In the expected-deadlock case, this spawns a thread that never terminates (it remains blocked in rt.block_on), which can leak an OS thread for the remainder of the test binary and potentially interfere with other tests running in the same process.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| done_tx.send(()).unwrap(); | ||
| }); | ||
|
|
||
| done_rx.recv_timeout(Duration::from_secs(10)) |
There was a problem hiding this comment.
tokio/tests/rt_unstable_eager_driver_handoff.rs:138: deadlocks_consistently appears to pass only after waiting the full 10s recv_timeout, which will reliably add ~10s to the test suite on supported platforms; is that duration intentional for CI?
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.


8010: To review by AI