Skip to content

7973: rt: shard the multi-thread inject queue to reduce remote spawn contention#89

Open
martin-augment wants to merge 2 commits intomasterfrom
pr-7973-2026-04-01-08-40-31
Open

7973: rt: shard the multi-thread inject queue to reduce remote spawn contention#89
martin-augment wants to merge 2 commits intomasterfrom
pr-7973-2026-04-01-08-40-31

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

7973: To review by AI

alex and others added 2 commits March 30, 2026 23:28
…tion

The multi-threaded scheduler's inject queue was protected by a single
global mutex (shared with idle coordination state). Every remote task
spawn — any spawn from outside a worker thread — acquired this lock,
serializing concurrent spawners and limiting throughput.

This change introduces `inject::Sharded`, which splits the inject queue
into up to 8 independent shards, each an existing `Shared`/`Synced`
pair with its own mutex and cache-line padding.

Design:
- Push: each thread is assigned a home shard on first push (via a
  global counter) and sticks with it. This keeps consecutive pushes
  from one thread cache-local while spreading distinct threads across
  distinct locks.
- Pop: workers rotate through shards starting at their own index,
  skipping empty shards via per-shard atomic length. pop_n drains from
  one shard at a time to keep critical sections bounded.
- Shard count: capped at 8 (and 1 under loom). Contention drops off
  steeply past a handful of shards, and is_empty()/len() scan all
  shards in the worker hot loop.
- is_closed: a single Release atomic set after all shards are closed,
  so the shutdown check stays lock-free.

Random shard selection via context::thread_rng_n (as used in tokio-rs#7757 for
the blocking pool) was measured and found to be 20-33% slower on
remote_spawn at 8+ threads. The inject workload is a tight loop of
trivial pushes where producer-side cache locality dominates: with RNG,
a hot thread bounces between shard cache lines on every push; with
sticky assignment it stays hot on one mutex and list tail. RNG did win
slightly (5-9%) on single-producer benchmarks where spreading tasks
lets workers pop in parallel, but not enough to offset the regression
at scale.

The inject state is removed from the global Synced mutex, which now
only guards idle coordination. This also helps the single-threaded
path since remote pushes no longer contend with worker park/unpark.

Results on remote_spawn benchmark (12,800 no-op tasks, N spawner
threads, 64-core box):

  threads   before    after    improvement
  1         9.38 ms   7.33 ms  -22%
  2        14.94 ms   6.64 ms  -56%
  4        23.69 ms   5.34 ms  -77%
  8        34.81 ms   4.69 ms  -87%
  16       32.33 ms   4.54 ms  -86%
  32       30.37 ms   4.73 ms  -84%
  64       26.59 ms   5.34 ms  -80%

rt_multi_threaded benchmarks: spawn_many_local -8%, spawn_many_remote_idle
-7%, yield_many -1%, rest neutral.

Developed in conjunction with Claude.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

Walkthrough

This pull request introduces documentation files (AGENTS.md, CLAUDE.md) with linking instructions, updates the spellcheck dictionary with new terms, and refactors Tokio's multi-threaded runtime scheduler's task injection queue. The changes replace a single shared injection queue (inject::Shared<T>) with a sharded implementation (inject::Sharded<T>) that maintains multiple independent shards. Queue operations are updated to select and access shards by hint, while synchronization logic is decoupled from the scheduler's core mutex. The trace_multi_thread function signature is simplified from unsafe to safe, removing explicit Synced parameter dependency.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-7973-2026-04-01-08-40-31

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

Code Review

This is a well-motivated change with a clean design. Sharding the inject queue is the right approach for reducing remote-spawn contention, and the implementation is generally sound. Here are my observations:


Correctness

pop vs pop_n TOCTOU inconsistency (sharded.rs:185-201)

pop_n correctly re-checks shard.shared.is_empty() after acquiring the lock to avoid an unnecessary call when another worker drains the shard between the atomic check and the lock acquisition. pop does not do this re-check — it calls shard.shared.pop() which returns None gracefully, so it is functionally correct, but it wastes a lock acquisition in the TOCTOU window. Recommend adding the same guard for consistency and performance symmetry:

let mut synced = shard.synced.lock();
if shard.shared.is_empty() {   // re-check under lock
    continue;
}
if let Some(task) = unsafe { shard.shared.pop(&mut synced) } {
    return Some(task);
}

close() stores is_closed = true redundantly on double-close (sharded.rs:145)

On a second close() call, all shard.shared.close() calls return false and was_open stays false, but is_closed.store(true, Release) is still executed. This is harmless but can be guarded:

if was_open {
    self.is_closed.store(true, Release);
}

Design Concerns

pop_n drains only one shard per invocation (sharded.rs:212-244)

The docstring acknowledges this, but the implication is worth spelling out: if external threads have a non-uniform push pattern (e.g., all landing on one shard due to affinity), the workers whose hint places them far from that shard will scan through up to MAX_SHARDS - 1 empty shards before finding work. Additionally, tasks queued in that single hot shard are drained at one batch per worker tick, which could increase scheduling latency under bursty loads compared to the previous single-queue design. This trade-off should be documented in the module-level comment.

shard_hint.clamp(1, MAX_SHARDS).next_power_of_two() can yield more shards than workers (sharded.rs:84)

For non-power-of-two worker counts (3, 5, 6, 7 workers), next_power_of_two() rounds up to 4 or 8. A 5-worker runtime gets 8 shards: workers 0–4 each start draining at their own index, leaving shards 5–7 as "orphan" home shards for external threads. Those shards are only reached via rotation, potentially adding latency. Consider whether the clamp should happen after next_power_of_two() or whether the rounding-up behavior should be documented. Using shard_hint.next_power_of_two().min(MAX_SHARDS) would cap at 8 while still rounding up, but would keep the shard count ≤ the capped value rather than potentially equaling it for values just above a power-of-two boundary.

Global PUSH_SHARD / NEXT_SHARD shared across runtimes (sharded.rs:62)

The comment correctly explains this is intentional. Worth noting one subtle edge case: threads that are alive when the TLS value is first assigned from one runtime's Sharded (with, say, 2 shards) and later push into a second runtime (with 8 shards) will get the same raw shard index but a different masked result. Distribution is still approximately uniform, just making sure this is understood.


Test Coverage

No direct unit tests for Sharded

The new type has no dedicated tests. The loom tests cap MAX_SHARDS = 1, so the shard-rotation logic in pop and pop_n (i in 0..num_shards) is never exercised by any loom model. Consider adding:

  • A unit test for close() behavior across shards (verify is_closed() returns true, pushes are dropped after close, double-close returns false).
  • A test for pop_n rotation: push tasks into specific shards and verify workers with different hint values drain them correctly.
  • An integration test or loom model covering concurrent push + pop across multiple simulated threads.

The new benches/remote_spawn.rs is a good addition for measuring the improvement — it would be helpful to document the expected baseline improvement in the PR description.


Minor Nits

  • sharded.rs:8: Module-level comment says "Workers drain shards starting from their own index and rotate." Consider adding that the rotation wraps around modulo num_shards for clarity.
  • worker.rs:1063-1073: The closure-based pop_n API is a clean improvement over holding the synced lock across the batch — good ergonomic win.
  • shared.rs:41-43: Narrowing is_closed to #[cfg(feature = "taskdump")] is correct since Sharded now owns its own AtomicBool for the lock-free check. This is a good cleanup.

Summary

The core approach (sharding + per-thread home shard + worker-index-hinted drain) is sound and the implementation correctly handles the close/push/pop lifecycle. The main gaps are the missing unit tests for the Sharded type and the undocumented latency implications of single-shard-per-call draining under skewed load. The next_power_of_two shard-count decision for non-power-of-two worker counts is worth a second look.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a sharded injection queue for the multi-threaded scheduler to reduce contention on the global task queue. By partitioning the queue into multiple shards, each protected by its own mutex, the implementation allows concurrent task submissions from multiple external threads with significantly reduced lock contention. The scheduler's internal structures were updated to remove the global injection queue from the main synchronization lock, and worker logic was adjusted to rotate through shards when popping tasks. I have no feedback to provide as the review comments were purely explanatory.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Apr 1, 2026

🤖 Augment PR Summary

Summary: This PR reduces multi-thread runtime contention by sharding the global inject queue used for remote task scheduling.

Changes:

  • Introduces inject::Sharded, splitting the inject queue into multiple cache-padded shards, each with its own mutex and intrusive list.
  • Distributes remote push/push_batch operations across shards using a per-thread “home shard” assignment.
  • Updates multi-thread scheduler state to use the sharded inject queue and removes inject state from the scheduler-wide Synced mutex.
  • Adjusts worker remote-drain logic to pop/rotate across shards based on worker index, including a shard-scoped pop_n fast-path.
  • Simplifies shutdown/close paths to close the sharded queue directly and updates remote scheduling entry points accordingly.
  • Updates taskdump tracing to work with the sharded inject queue without requiring the scheduler mutex.
  • Adds a couple of repository rule files and updates the spellcheck dictionary.

Technical Notes: Shard count is capped (and forced to 1 under loom) to keep hot-loop scanning cost bounded while still reducing remote-spawn mutex contention.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

let mut synced = synced.lock();
// Safety: exactly the same safety requirements as `trace_multi_thread` function.
while let Some(notified) = unsafe { injection.pop(&mut synced.inject) } {
while let Some(notified) = injection.pop(0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop drains the sharded injection queue while external threads can still enqueue into it (remote spawns no longer share the scheduler mutex), so under sustained remote spawning the task dump could become a moving target or potentially never complete.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@@ -0,0 +1 @@
Never link to any issue or a pull request in any GitHub repository
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated AI instruction files committed with code change

Low Severity

AGENTS.md and CLAUDE.md were added containing only the directive "Never link to any issue or a pull request in any GitHub repository." These files are unrelated to the inject queue sharding change described in the PR title and appear to be AI prompt-manipulation instructions that were bundled into an unrelated PR.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tokio/src/runtime/scheduler/inject/sharded.rs (2)

251-268: Consider documenting the thread-local fallback behavior.

When PUSH_SHARD.try_with() fails (e.g., during thread shutdown), the fallback to shard 0 is safe but could cause temporary contention. This is acceptable, but a brief comment explaining the fallback would aid future maintainers.

📝 Optional documentation improvement
         PUSH_SHARD
             .try_with(|cell| {
                 let mut idx = cell.get();
                 if idx == UNASSIGNED {
                     idx = NEXT_SHARD.fetch_add(1, Relaxed);
                     cell.set(idx);
                 }
                 idx & self.shard_mask
             })
+            // If the thread-local is inaccessible (e.g., during thread
+            // shutdown), fall back to shard 0. This may cause brief
+            // contention but is safe.
             .unwrap_or(0)
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tokio/src/runtime/scheduler/inject/sharded.rs` around lines 251 - 268, Add a
brief inline comment inside fn next_push_shard explaining why
PUSH_SHARD.try_with unwraps to 0: note that when thread-local access fails
(e.g., during thread shutdown or teardown) we intentionally fall back to shard 0
to remain safe, that this may cause temporary contention but is acceptable, and
mention the roles of PUSH_SHARD, UNASSIGNED, and NEXT_SHARD in assigning
per-thread shard indices so future maintainers understand the rationale. Place
the comment near the unwrap_or(0) fallback and reference shard_mask behavior for
the single-shard fast-path.

132-148: Verify ordering between close() and concurrent push_batch() calls.

The close() implementation iterates through shards sequentially, closing each under its own lock. A concurrent push_batch() could theoretically select an already-iterated (still open) shard before close() reaches it, then acquire the lock after that shard is closed—but this is actually fine since the per-shard synced.is_closed flag is checked under the lock.

However, the comment "After this loop no shard will accept a push" is accurate only after the loop completes. Consider whether the is_closed.store(true, Release) should precede the shard iteration to allow early bail-out in push paths (though current impl is still correct).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tokio/src/runtime/scheduler/inject/sharded.rs` around lines 132 - 148, Move
the publication of the closed flag earlier so concurrent push paths can observe
the global closed state and bail out: in close() call self.is_closed.store(true,
Release) before iterating shards, then iterate self.shards, take each
shard.synced.lock(), and call shard.shared.close(&mut synced) to compute
was_open as before; keep using shard.shared.close and shard.synced.lock so
per-shard state is still set under the lock and push_batch() still checks
per-shard synced.is_closed under the lock.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Line 1: Update AGENTS.md (and mirror in CLAUDE.md) to replace the one-line
rule with a clarified guideline: define "link" to include both clickable URLs
and plain-text references to GitHub issues/PRs, specify that the restriction
applies to external repositories but allows internal references when necessary
with justification, list the contexts covered (commit messages, code comments,
docs, PR descriptions), provide a short rationale (privacy/security/irrelevance)
and include 3 concrete examples showing a disallowed external GitHub link, an
allowed internal reference with justification, and a suggested alternative
(e.g., summarize the issue instead of linking).

In `@tokio/src/runtime/scheduler/inject/shared.rs`:
- Around line 40-44: The method visibility mismatch: Shared::is_closed is
compiled only with #[cfg(feature = "taskdump")] but Core::maintenance() calls
worker.inject().is_closed() unconditionally (see Core::maintenance,
worker.inject(), Inject::is_closed), causing build failures when rt-multi-thread
is enabled without taskdump; fix by making the API unconditional or guarding the
call. Either remove the cfg from Shared::is_closed (and the corresponding
Inject::is_closed) so the method always exists and simply return
synced.is_closed, or keep the cfg and add #[cfg(feature = "taskdump")] to the
call site in Core::maintenance (worker.inject().is_closed()), ensuring both
declaration and use are consistently guarded.

---

Nitpick comments:
In `@tokio/src/runtime/scheduler/inject/sharded.rs`:
- Around line 251-268: Add a brief inline comment inside fn next_push_shard
explaining why PUSH_SHARD.try_with unwraps to 0: note that when thread-local
access fails (e.g., during thread shutdown or teardown) we intentionally fall
back to shard 0 to remain safe, that this may cause temporary contention but is
acceptable, and mention the roles of PUSH_SHARD, UNASSIGNED, and NEXT_SHARD in
assigning per-thread shard indices so future maintainers understand the
rationale. Place the comment near the unwrap_or(0) fallback and reference
shard_mask behavior for the single-shard fast-path.
- Around line 132-148: Move the publication of the closed flag earlier so
concurrent push paths can observe the global closed state and bail out: in
close() call self.is_closed.store(true, Release) before iterating shards, then
iterate self.shards, take each shard.synced.lock(), and call
shard.shared.close(&mut synced) to compute was_open as before; keep using
shard.shared.close and shard.synced.lock so per-shard state is still set under
the lock and push_batch() still checks per-shard synced.is_closed under the
lock.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e6775ae4-2543-4479-9111-7cb1f5fc7757

📥 Commits

Reviewing files that changed from the base of the PR and between b4123ee and bbbd076.

📒 Files selected for processing (10)
  • AGENTS.md
  • CLAUDE.md
  • spellcheck.dic
  • tokio/src/runtime/scheduler/inject.rs
  • tokio/src/runtime/scheduler/inject/sharded.rs
  • tokio/src/runtime/scheduler/inject/shared.rs
  • tokio/src/runtime/scheduler/multi_thread/mod.rs
  • tokio/src/runtime/scheduler/multi_thread/worker.rs
  • tokio/src/runtime/scheduler/multi_thread/worker/taskdump.rs
  • tokio/src/runtime/task/trace/mod.rs
💤 Files with no reviewable changes (1)
  • tokio/src/runtime/scheduler/multi_thread/mod.rs

@@ -0,0 +1 @@
Never link to any issue or a pull request in any GitHub repository
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if CLAUDE.md was also added and examine any related context

fd -t f 'CLAUDE.md|AGENTS.md' -x cat {}

Repository: martin-augment/tokio

Length of output: 197


🏁 Script executed:

git log --oneline --follow -- AGENTS.md CLAUDE.md | head -20

Repository: martin-augment/tokio

Length of output: 109


🏁 Script executed:

git show HEAD:AGENTS.md 2>/dev/null || echo "File not in HEAD"
git show HEAD^:AGENTS.md 2>/dev/null || echo "File not in parent"

Repository: martin-augment/tokio

Length of output: 149


🏁 Script executed:

# Check if there's a PR description or related documentation
find . -maxdepth 2 -type f \( -name "*.md" -o -name "*.txt" \) | head -20

Repository: martin-augment/tokio

Length of output: 512


🏁 Script executed:

# Look for any references to AGENTS.md or CLAUDE.md in the codebase
rg -i "agents|claude" --type md --type txt | head -30

Repository: martin-augment/tokio

Length of output: 46


Clarify the instruction and provide rationale.

Both AGENTS.md and CLAUDE.md contain an ambiguous, one-line instruction with no context or examples. The instruction needs clarification:

  • What does "link" mean—hyperlinks only, or any reference to issues/PRs?
  • Does "any GitHub repository" include this repository, or only external ones?
  • What contexts does this apply to (commit messages, code comments, documentation)?

Add documentation explaining the rationale for this guideline and specific examples to help contributors understand when and how it applies.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` at line 1, Update AGENTS.md (and mirror in CLAUDE.md) to replace
the one-line rule with a clarified guideline: define "link" to include both
clickable URLs and plain-text references to GitHub issues/PRs, specify that the
restriction applies to external repositories but allows internal references when
necessary with justification, list the contexts covered (commit messages, code
comments, docs, PR descriptions), provide a short rationale
(privacy/security/irrelevance) and include 3 concrete examples showing a
disallowed external GitHub link, an allowed internal reference with
justification, and a suggested alternative (e.g., summarize the issue instead of
linking).

Comment on lines 40 to 44
// Kind of annoying to have to include the cfg here
#[cfg(any(feature = "taskdump", feature = "rt-multi-thread"))]
#[cfg(feature = "taskdump")]
pub(crate) fn is_closed(&self, synced: &Synced) -> bool {
synced.is_closed
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify that Shared::is_closed is not called from rt-multi-thread paths

# Search for calls to shared.is_closed or Shared::is_closed patterns
rg -n 'shared\.is_closed|Shared::is_closed' --type rust

# Also check for is_closed calls on inject::Shared specifically
rg -nC3 'inject::Shared' --type rust | head -100

Repository: martin-augment/tokio

Length of output: 1353


🏁 Script executed:

cat -n tokio/src/runtime/scheduler/inject.rs | head -60

Repository: martin-augment/tokio

Length of output: 2006


🏁 Script executed:

# Search more comprehensively for all is_closed calls
rg 'is_closed\(' --type rust -n

# Also check for any usage in rt-multi-thread specific code
rg -n 'is_closed' tokio/src/runtime/scheduler/rt_multi_thread.rs 2>/dev/null || echo "File not found or no matches"

Repository: martin-augment/tokio

Length of output: 9450


🏁 Script executed:

# Check multi-thread worker context
sed -n '1205,1220p' tokio/src/runtime/scheduler/multi_thread/worker.rs

# Check current-thread context
sed -n '550,560p' tokio/src/runtime/scheduler/current_thread/mod.rs

Repository: martin-augment/tokio

Length of output: 918


🏁 Script executed:

# Get wider context for multi_thread/worker.rs maintenance function
sed -n '1195,1225p' tokio/src/runtime/scheduler/multi_thread/worker.rs

# Get wider context for current_thread/mod.rs
sed -n '540,570p' tokio/src/runtime/scheduler/current_thread/mod.rs

Repository: martin-augment/tokio

Length of output: 2246


🏁 Script executed:

# Check if maintenance() has cfg guards
grep -B10 'fn maintenance' tokio/src/runtime/scheduler/multi_thread/worker.rs | head -20

# Check if pre_shutdown() has cfg guards
grep -B10 'fn pre_shutdown' tokio/src/runtime/scheduler/current_thread/mod.rs | head -20

# More broadly, check what's around current_thread pre_shutdown
sed -n '500,560p' tokio/src/runtime/scheduler/current_thread/mod.rs | grep -E '#\[cfg|fn pre_shutdown|trace'

Repository: martin-augment/tokio

Length of output: 764


🏁 Script executed:

# Get the full function signature for pre_shutdown with its cfg guards
sed -n '520,580p' tokio/src/runtime/scheduler/current_thread/mod.rs

Repository: martin-augment/tokio

Length of output: 1954


🏁 Script executed:

# Check where maintenance() is called from in multi_thread/worker.rs
rg -n 'maintenance\(' tokio/src/runtime/scheduler/multi_thread/worker.rs

# Get context around those calls
rg -B5 'self.maintenance\(' tokio/src/runtime/scheduler/multi_thread/worker.rs

Repository: martin-augment/tokio

Length of output: 532


🏁 Script executed:

# Check the context around line 566 where maintenance is called
sed -n '555,575p' tokio/src/runtime/scheduler/multi_thread/worker.rs

# Check the context around line 742 fn maintenance definition
sed -n '735,760p' tokio/src/runtime/scheduler/multi_thread/worker.rs

Repository: martin-augment/tokio

Length of output: 1602


🏁 Script executed:

# Check what module Core is in and if it's in rt_multi_thread
grep -n 'struct Core' tokio/src/runtime/scheduler/multi_thread/worker.rs | head -5

# Check line 1207 more carefully - what's the full context
sed -n '1195,1220p' tokio/src/runtime/scheduler/multi_thread/worker.rs

Repository: martin-augment/tokio

Length of output: 870


🏁 Script executed:

# Check if maintenance method or surrounding code has cfg guards
sed -n '1200,1230p' tokio/src/runtime/scheduler/multi_thread/worker.rs

# More importantly, check if Core::maintenance is used conditionally
rg -B5 -A15 'fn maintenance\(&mut self, worker: &Worker' tokio/src/runtime/scheduler/multi_thread/worker.rs | head -30

Repository: martin-augment/tokio

Length of output: 1838


🏁 Script executed:

# Check if Inject::is_closed is called from Core::maintenance in multi_thread
# and whether Core::maintenance itself is guarded
grep -n 'impl.*Core' tokio/src/runtime/scheduler/multi_thread/worker.rs | head -5

# More importantly, verify the actual file structure
head -50 tokio/src/runtime/scheduler/multi_thread/worker.rs

Repository: martin-augment/tokio

Length of output: 2381


🏁 Script executed:

# Check if Core::maintenance is guarded by cfg
sed -n '1005,1030p' tokio/src/runtime/scheduler/multi_thread/worker.rs

# Also check if the entire multi_thread module is guarded
head -30 tokio/src/runtime/scheduler/multi_thread/worker.rs

# Check if there's any cfg around the maintenance call at line 753
sed -n '750,760p' tokio/src/runtime/scheduler/multi_thread/worker.rs

Repository: martin-augment/tokio

Length of output: 2504


🏁 Script executed:

# Check if there's a cfg guard around the is_closed call at line 1213
sed -n '1205,1220p' tokio/src/runtime/scheduler/multi_thread/worker.rs | cat -A

# Check Cargo.toml for feature dependencies
grep -A20 '\[features\]' tokio/Cargo.toml | head -40

Repository: martin-augment/tokio

Length of output: 865


🏁 Script executed:

# Check if is_traced is also a taskdump-only field, which might indicate the whole code path is guarded
grep -n 'is_traced' tokio/src/runtime/scheduler/multi_thread/worker.rs | head -20

# Check if is_traced is set conditionally
rg 'is_traced.*=' tokio/src/runtime/scheduler/multi_thread/worker.rs

Repository: martin-augment/tokio

Length of output: 475


🏁 Script executed:

# Check if there are multiple is_closed definitions - maybe one without cfg
rg -n 'pub.*fn is_closed' tokio/src/runtime/scheduler/inject.rs

# Check if cfg_rt_multi_thread macro has any relationship to taskdump
grep -n 'cfg_rt_multi_thread' tokio/src/runtime/scheduler/inject.rs

# More importantly, check if there's conditional compilation around the is_closed call
rg -B10 'worker.inject\(\).is_closed' tokio/src/runtime/scheduler/multi_thread/worker.rs

Repository: martin-augment/tokio

Length of output: 511


🏁 Script executed:

# Check if maybe is_closed is conditionally compiled elsewhere
rg -n '#\[cfg' tokio/src/runtime/scheduler/inject.rs

# Check if there's any guard around the call itself
sed -n '1210,1220p' tokio/src/runtime/scheduler/multi_thread/worker.rs

# Most importantly: check if this code compiles by looking at test results or if there are feature combinations
grep -n 'rt-multi-thread' tokio/Cargo.toml | head -10

Repository: martin-augment/tokio

Length of output: 491


The cfg narrowing appears incorrect—worker.inject().is_closed() is called unconditionally from Core::maintenance() in the rt-multi-thread scheduler.

The review assumes no multi-thread code path calls Shared::is_closed, but evidence shows Core::maintenance() at tokio/src/runtime/scheduler/multi_thread/worker.rs:1213 calls worker.inject().is_closed() without any cfg guard. Since Inject::is_closed() is guarded by #[cfg(feature = "taskdump")], this would cause a compilation failure when rt-multi-thread is enabled without the taskdump feature. Verify whether there's a second unguarded implementation of is_closed() or whether this call should also be guarded by taskdump.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tokio/src/runtime/scheduler/inject/shared.rs` around lines 40 - 44, The
method visibility mismatch: Shared::is_closed is compiled only with
#[cfg(feature = "taskdump")] but Core::maintenance() calls
worker.inject().is_closed() unconditionally (see Core::maintenance,
worker.inject(), Inject::is_closed), causing build failures when rt-multi-thread
is enabled without taskdump; fix by making the API unconditional or guarding the
call. Either remove the cfg from Shared::is_closed (and the corresponding
Inject::is_closed) so the method always exists and simply return
synced.is_closed, or keep the cfg and add #[cfg(feature = "taskdump")] to the
call site in Core::maintenance (worker.inject().is_closed()), ensuring both
declaration and use are consistently guarded.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The CodeRabbit AI reviewer is not correct! The PR changed worker.rs:1213 to use the new Sharded::is_closed() that uses an AtomicBool to track whether the shard is closed, so there is no need of the cfg gating.

@martin-augment
Copy link
Copy Markdown
Owner Author

shard_hint.clamp(1, MAX_SHARDS).next_power_of_two() can yield more shards than workers (sharded.rs:84)

For non-power-of-two worker counts (3, 5, 6, 7 workers), next_power_of_two() rounds up to 4 or 8. A 5-worker runtime gets 8 shards: workers 0–4 each start draining at their own index, leaving shards 5–7 as "orphan" home shards for external threads. Those shards are only reached via rotation, potentially adding latency. Consider whether the clamp should happen after next_power_of_two() or whether the rounding-up behavior should be documented. Using shard_hint.next_power_of_two().min(MAX_SHARDS) would cap at 8 while still rounding up, but would keep the shard count ≤ the capped value rather than potentially equaling it for values just above a power-of-two boundary.

value:annoying; category:bug; feedback: The Claude AI reviewer is not correct! Both shard_hint.clamp(1, MAX_SHARDS).next_power_of_two() and shard_hint.next_power_of_two().min(MAX_SHARDS) do the same, so the proposed improvement does not change anything.

@martin-augment
Copy link
Copy Markdown
Owner Author

pop vs pop_n TOCTOU inconsistency (sharded.rs:185-201)

pop_n correctly re-checks shard.shared.is_empty() after acquiring the lock to avoid an unnecessary call when another worker drains the shard between the atomic check and the lock acquisition. pop does not do this re-check — it calls shard.shared.pop() which returns None gracefully, so it is functionally correct, but it wastes a lock acquisition in the TOCTOU window. Recommend adding the same guard for consistency and performance symmetry:

let mut synced = shard.synced.lock();
if shard.shared.is_empty() {   // re-check under lock
    continue;
}
if let Some(task) = unsafe { shard.shared.pop(&mut synced) } {
    return Some(task);
}

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! Adding the same check for an empty queue will save few CPUs cycles also for the pop() operation. Prevents extra work that will do nothing at the end.

@martin-augment
Copy link
Copy Markdown
Owner Author

close() stores is_closed = true redundantly on double-close (sharded.rs:145)

On a second close() call, all shard.shared.close() calls return false and was_open stays false, but is_closed.store(true, Release) is still executed. This is harmless but can be guarded:

if was_open {
    self.is_closed.store(true, Release);
}

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! By using the local boolean flag the access to the atomic boolean could be prevented. It is a minor optimization of the CPU usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants