Conversation
Use fixed size batch sizes of 128 for testing
Clear buffer if error and read_to_end
Use single cqe_to_result function
WalkthroughThis change introduces batch I/O operation support to the tokio I/O uring driver. The modifications add const generic batch-size parameters across the operation model, introduce a new ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
tokio/src/runtime/driver/op.rs (2)
212-217: Silent skip when lifecycle entry is missing may hide bugs.When
ctx.ops.get_mut(*idx)returnsNone, the code silently continues. This could occur if an index was unexpectedly removed, potentially indicating a bug. Consider at minimum logging this case, or tracking it to ensure the completion check still works correctly.The current logic relies on
check_slab_entryreturningtruewhen all entries areNone, which works but makes debugging harder if entries disappear unexpectedly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tokio/src/runtime/driver/op.rs` around lines 212 - 217, The loop over ids currently silently continues when ctx.ops.get_mut(*idx) returns None; update the logic in the for loop that iterates through ids (the block using ctx.ops.get_mut and lifecycle) to record or log missing indices instead of silently skipping—e.g., push missing idx into a local Vec or call a debug/error log on the missing entry—so missing entries are visible during debugging and so the later check_slab_entry invocation can account for these missing indices; ensure any new collection or logging uses existing context (ctx, ids) and does not change the final completion semantics (check_slab_entry should still be consulted to determine overall completion).
58-67: Thecompletedarray is always allocated at full size regardless of operation type.For single operations (
Op::new), thecompleted: [i32; N]array withN = DEFAULT_BATCH_SIZE (255)is allocated but never used. This wastes ~1KB per single operation.Consider using an enum for the completion storage or making single operations use
N = 1.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tokio/src/runtime/driver/op.rs` around lines 58 - 67, The completed array currently always allocates [i32; N] inside Op<T, const N> causing wasted memory for single operations; change the storage to be conditional by either (A) making Op::new instantiate Op with N = 1 for single-operation paths and keep the generic N for batch paths, or (B) replace the completed: [i32; N] field with an enum (e.g., Completed::Single(i32) | Completed::Batch(Box<[i32; N]>)) or a heap-allocated boxed/small-array variant so single ops only store one i32; update uses of completed (tests and methods on Op, especially any code in functions named Op::new, methods accessing completed, and the State<N> interactions) to handle both variants accordingly. Ensure constructors and drop semantics are adjusted to construct the appropriate variant and all places that iterate or index into completed handle the single vs batch case.tokio/src/io/uring/read.rs (1)
146-148: Consider usingMaybeUninit::array_assume_initwhen stable, or add safety comment.The
transmute_copyis correct here since all elements were initialized, but lacks a safety comment explaining why this is sound. Also,transmute_copymay leave the original array in an inconsistent state (though it's not used after).Add safety documentation
+/// # Safety +/// Caller must ensure all elements of `n_ops` have been initialized via `write()`. unsafe fn assume_init<const N: usize>(n_ops: [MaybeUninit<Entry>; N]) -> [Entry; N] { + // SAFETY: Caller guarantees all elements are initialized. + // transmute_copy is used because MaybeUninit::array_assume_init is unstable. unsafe { std::mem::transmute_copy(&n_ops) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tokio/src/io/uring/read.rs` around lines 146 - 148, The unsafe helper function assume_init<const N: usize>(n_ops: [MaybeUninit<Entry>; N]) -> [Entry; N] uses std::mem::transmute_copy without any safety documentation; either switch to the stable MaybeUninit::array_assume_init API when available or replace transmute_copy with a sound move-out pattern (e.g., reading the array with ptr::read) and add a clear safety comment above assume_init explaining that every element of n_ops is fully initialized, that we take ownership and will not use n_ops afterwards, and why the chosen approach does not violate aliasing or drop safety for Entry.tokio/src/runtime/io/driver/uring.rs (1)
121-124: Thecheck_slab_entrylogic is inverted.The function comment says "check if the specified range of slab indexes exist or not" but the implementation returns
truewhen all entries areNone(i.e., don't exist). The function namecheck_slab_entrycombined with returningtruewhen entries are absent is confusing. Based on usage inop.rsline 248 (ctx.check_slab_entry(&*ids)to determine batch completion), this appears intentional—checking that all operations have been removed from the slab—but the naming is misleading.Consider renaming to
all_entries_removedorare_all_completedto clarify intent.Suggested rename for clarity
- // check if the specified range of slab indexes exist or not - pub(crate) fn check_slab_entry(&self, indexes: &[usize]) -> bool { + // Returns true if all indexed operations have been removed from the slab (i.e., completed) + pub(crate) fn all_entries_removed(&self, indexes: &[usize]) -> bool { indexes.iter().all(|i| self.ops.get(*i).is_none()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tokio/src/runtime/io/driver/uring.rs` around lines 121 - 124, The method check_slab_entry has inverted semantics (returns true when all slab slots are None); rename it to a clearer name like all_entries_removed or are_all_completed, update its doc comment to state it returns true when every specified index is empty, and update every call site (e.g., the call in op.rs that does ctx.check_slab_entry(&*ids)) to use the new name so the intent (checking that all operations have been removed/completed) is clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tokio/src/fs/read_uring.rs`:
- Around line 17-20: The comment on MAX_READ_SIZE incorrectly states 64 MiB
while the value 64 * 1024 equals 64 KiB; update the comment to reflect 64 KiB
(or change the constant to 64 * 1024 * 1024 if the intent was 64 MiB).
Specifically, modify the doc line "Set to read max 64 MiB at time" to "Set to
read max 64 KiB at a time" (or adjust the constant if you want MiB) while
keeping the SAFETY note about u32::MAX consistent with the chosen value for
MAX_READ_SIZE.
In `@tokio/src/io/uring/read.rs`:
- Around line 91-118: The loop currently skips assigning the current iteration's
op when array_idx == 0 && index != 0 because the code submits the previous batch
in the if branch and uses else for assignment; change the control flow so the
batch submission (unsafe { Op::batch(n_ops.clone(), Read { fd, buf }) } +
uring_task(...)) happens first when array_idx == 0 && index != 0 but does NOT
use else — after submitting, continue to construct the Read opcode
(opcode::Read::new(...).offset(...).build().flags(Flags::IO_LINK)) and assign it
into n_ops[array_idx]; keep updating last_len = array_idx as before so the final
batch handling still works. This ensures the first entry of each subsequent
batch is populated instead of skipped.
---
Nitpick comments:
In `@tokio/src/io/uring/read.rs`:
- Around line 146-148: The unsafe helper function assume_init<const N:
usize>(n_ops: [MaybeUninit<Entry>; N]) -> [Entry; N] uses
std::mem::transmute_copy without any safety documentation; either switch to the
stable MaybeUninit::array_assume_init API when available or replace
transmute_copy with a sound move-out pattern (e.g., reading the array with
ptr::read) and add a clear safety comment above assume_init explaining that
every element of n_ops is fully initialized, that we take ownership and will not
use n_ops afterwards, and why the chosen approach does not violate aliasing or
drop safety for Entry.
In `@tokio/src/runtime/driver/op.rs`:
- Around line 212-217: The loop over ids currently silently continues when
ctx.ops.get_mut(*idx) returns None; update the logic in the for loop that
iterates through ids (the block using ctx.ops.get_mut and lifecycle) to record
or log missing indices instead of silently skipping—e.g., push missing idx into
a local Vec or call a debug/error log on the missing entry—so missing entries
are visible during debugging and so the later check_slab_entry invocation can
account for these missing indices; ensure any new collection or logging uses
existing context (ctx, ids) and does not change the final completion semantics
(check_slab_entry should still be consulted to determine overall completion).
- Around line 58-67: The completed array currently always allocates [i32; N]
inside Op<T, const N> causing wasted memory for single operations; change the
storage to be conditional by either (A) making Op::new instantiate Op with N = 1
for single-operation paths and keep the generic N for batch paths, or (B)
replace the completed: [i32; N] field with an enum (e.g., Completed::Single(i32)
| Completed::Batch(Box<[i32; N]>)) or a heap-allocated boxed/small-array variant
so single ops only store one i32; update uses of completed (tests and methods on
Op, especially any code in functions named Op::new, methods accessing completed,
and the State<N> interactions) to handle both variants accordingly. Ensure
constructors and drop semantics are adjusted to construct the appropriate
variant and all places that iterate or index into completed handle the single vs
batch case.
In `@tokio/src/runtime/io/driver/uring.rs`:
- Around line 121-124: The method check_slab_entry has inverted semantics
(returns true when all slab slots are None); rename it to a clearer name like
all_entries_removed or are_all_completed, update its doc comment to state it
returns true when every specified index is empty, and update every call site
(e.g., the call in op.rs that does ctx.check_slab_entry(&*ids)) to use the new
name so the intent (checking that all operations have been removed/completed) is
clear.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 65646d5e-6ddb-415e-aa4c-616bdfdcd7bb
📒 Files selected for processing (6)
tokio/src/fs/read_uring.rstokio/src/io/uring/open.rstokio/src/io/uring/read.rstokio/src/io/uring/write.rstokio/src/runtime/driver/op.rstokio/src/runtime/io/driver/uring.rs
| // Max bytes we can read using io uring submission at a time | ||
| // SAFETY: cannot be higher than u32::MAX for safe cast | ||
| // Set to read max 64 MiB at time | ||
| const MAX_READ_SIZE: usize = 64 * 1024 * 1024; | ||
| pub(crate) const MAX_READ_SIZE: usize = 64 * 1024; |
There was a problem hiding this comment.
Comment/value mismatch: MAX_READ_SIZE is 64 KiB, not 64 MiB.
The comment says "Set to read max 64 MiB at time" but 64 * 1024 = 65,536 bytes = 64 KiB.
Fix comment
// Max bytes we can read using io uring submission at a time
// SAFETY: cannot be higher than u32::MAX for safe cast
-// Set to read max 64 MiB at time
+// Set to read max 64 KiB at a time
pub(crate) const MAX_READ_SIZE: usize = 64 * 1024;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tokio/src/fs/read_uring.rs` around lines 17 - 20, The comment on
MAX_READ_SIZE incorrectly states 64 MiB while the value 64 * 1024 equals 64 KiB;
update the comment to reflect 64 KiB (or change the constant to 64 * 1024 * 1024
if the intent was 64 MiB). Specifically, modify the doc line "Set to read max 64
MiB at time" to "Set to read max 64 KiB at a time" (or adjust the constant if
you want MiB) while keeping the SAFETY note about u32::MAX consistent with the
chosen value for MAX_READ_SIZE.
| for (index, start) in (0..len).step_by(MAX_READ_SIZE).enumerate() { | ||
| let end = (start + MAX_READ_SIZE).min(len); // clamp to len for the final chunk | ||
| let array_idx = index % N; | ||
|
|
||
| // skip first iteration | ||
| if array_idx == 0 && index != 0 { | ||
| // SAFETY: Batches are valid array entries | ||
| let op = unsafe { Op::batch(n_ops.clone(), Read { fd, buf }) }; | ||
| let (_, r_fd, r_buf) = uring_task(op, &mut read_len).await?; | ||
|
|
||
| fd = r_fd; | ||
| buf = r_buf; | ||
| } else { | ||
| let op = opcode::Read::new( | ||
| types::Fd(fd.as_raw_fd()), | ||
| buf.spare_capacity_mut()[start..].as_mut_ptr().cast(), | ||
| (end - start) as u32, | ||
| ) | ||
| .offset(start as u64) | ||
| .build() | ||
| // link our sqes so cqes arrive in order | ||
| .flags(Flags::IO_LINK); | ||
|
|
||
| n_ops[array_idx] = op; | ||
| } | ||
|
|
||
| last_len = array_idx; // save last array_idx for last batch | ||
| } |
There was a problem hiding this comment.
Off-by-one issue: First entry of subsequent batches is skipped.
When array_idx == 0 && index != 0, the code submits the current n_ops batch before setting n_ops[0] for the current iteration. This means:
- The first batch correctly gets entries at indices 0 to N-1
- When starting the second batch (
index == N),array_idxis 0, so it submits the previous batch (correct), but then the current read operation forindex == Nis never added ton_ops—the loop continues to the next iteration
The logic should either:
- Submit the batch, then fall through to set
n_ops[array_idx], or - Restructure to submit after the array is filled
Suggested fix
for (index, start) in (0..len).step_by(MAX_READ_SIZE).enumerate() {
let end = (start + MAX_READ_SIZE).min(len); // clamp to len for the final chunk
let array_idx = index % N;
// skip first iteration
if array_idx == 0 && index != 0 {
// SAFETY: Batches are valid array entries
let op = unsafe { Op::batch(n_ops.clone(), Read { fd, buf }) };
let (_, r_fd, r_buf) = uring_task(op, &mut read_len).await?;
fd = r_fd;
buf = r_buf;
- } else {
- let op = opcode::Read::new(
- types::Fd(fd.as_raw_fd()),
- buf.spare_capacity_mut()[start..].as_mut_ptr().cast(),
- (end - start) as u32,
- )
- .offset(start as u64)
- .build()
- // link our sqes so cqes arrive in order
- .flags(Flags::IO_LINK);
-
- n_ops[array_idx] = op;
}
+
+ let op = opcode::Read::new(
+ types::Fd(fd.as_raw_fd()),
+ buf.spare_capacity_mut()[start..].as_mut_ptr().cast(),
+ (end - start) as u32,
+ )
+ .offset(start as u64)
+ .build()
+ // link our sqes so cqes arrive in order
+ .flags(Flags::IO_LINK);
+
+ n_ops[array_idx] = op;
last_len = array_idx; // save last array_idx for last batch
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (index, start) in (0..len).step_by(MAX_READ_SIZE).enumerate() { | |
| let end = (start + MAX_READ_SIZE).min(len); // clamp to len for the final chunk | |
| let array_idx = index % N; | |
| // skip first iteration | |
| if array_idx == 0 && index != 0 { | |
| // SAFETY: Batches are valid array entries | |
| let op = unsafe { Op::batch(n_ops.clone(), Read { fd, buf }) }; | |
| let (_, r_fd, r_buf) = uring_task(op, &mut read_len).await?; | |
| fd = r_fd; | |
| buf = r_buf; | |
| } else { | |
| let op = opcode::Read::new( | |
| types::Fd(fd.as_raw_fd()), | |
| buf.spare_capacity_mut()[start..].as_mut_ptr().cast(), | |
| (end - start) as u32, | |
| ) | |
| .offset(start as u64) | |
| .build() | |
| // link our sqes so cqes arrive in order | |
| .flags(Flags::IO_LINK); | |
| n_ops[array_idx] = op; | |
| } | |
| last_len = array_idx; // save last array_idx for last batch | |
| } | |
| for (index, start) in (0..len).step_by(MAX_READ_SIZE).enumerate() { | |
| let end = (start + MAX_READ_SIZE).min(len); // clamp to len for the final chunk | |
| let array_idx = index % N; | |
| // skip first iteration | |
| if array_idx == 0 && index != 0 { | |
| // SAFETY: Batches are valid array entries | |
| let op = unsafe { Op::batch(n_ops.clone(), Read { fd, buf }) }; | |
| let (_, r_fd, r_buf) = uring_task(op, &mut read_len).await?; | |
| fd = r_fd; | |
| buf = r_buf; | |
| } | |
| let op = opcode::Read::new( | |
| types::Fd(fd.as_raw_fd()), | |
| buf.spare_capacity_mut()[start..].as_mut_ptr().cast(), | |
| (end - start) as u32, | |
| ) | |
| .offset(start as u64) | |
| .build() | |
| // link our sqes so cqes arrive in order | |
| .flags(Flags::IO_LINK); | |
| n_ops[array_idx] = op; | |
| last_len = array_idx; // save last array_idx for last batch | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tokio/src/io/uring/read.rs` around lines 91 - 118, The loop currently skips
assigning the current iteration's op when array_idx == 0 && index != 0 because
the code submits the previous batch in the if branch and uses else for
assignment; change the control flow so the batch submission (unsafe {
Op::batch(n_ops.clone(), Read { fd, buf }) } + uring_task(...)) happens first
when array_idx == 0 && index != 0 but does NOT use else — after submitting,
continue to construct the Read opcode
(opcode::Read::new(...).offset(...).build().flags(Flags::IO_LINK)) and assign it
into n_ops[array_idx]; keep updating last_len = array_idx as before so the final
batch handling still works. This ensures the first entry of each subsequent
batch is populated instead of skipped.
🤖 Augment PR SummarySummary: Extends Tokio’s io-uring file read path to support submitting reads in batches. Changes:
Technical Notes: Batch cancellation stores shared 🤖 Was this summary useful? React with 👍 or 👎 |
| let array_idx = index % N; | ||
|
|
||
| // skip first iteration | ||
| if array_idx == 0 && index != 0 { |
There was a problem hiding this comment.
When array_idx == 0 && index != 0, this branch submits the previous batch but doesn't enqueue a read for the current start, so every Nth chunk is skipped and n_ops[0] may remain a stale entry for later submissions. That can truncate or corrupt the returned buffer for files that span more than N chunks.
Severity: high
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| } | ||
| None => return Err((r_fd, r_buf)), | ||
| }, | ||
| Err(e) => { |
There was a problem hiding this comment.
In uring_task, ErrorKind::Interrupted results are currently ignored, which can leave unread holes in the buffer while still returning Ok(buf). Consider treating Interrupted as a retry/fallback condition (similar to op_read) to avoid returning partial/corrupt data.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
|
|
||
| // Max bytes we can read using io uring submission at a time | ||
| // SAFETY: cannot be higher than u32::MAX for safe cast | ||
| // Set to read max 64 MiB at time |
There was a problem hiding this comment.
The comment says "max 64 MiB" but MAX_READ_SIZE is now 64 * 1024 (64 KiB); can we confirm whether the constant or the comment should be updated? This value also changes batching eligibility (read_to_end_batch) and the per-op chunk size in read_to_end.
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| .flags(Flags::IO_LINK); | ||
|
|
||
| n_ops[array_idx] = op; | ||
| } |
There was a problem hiding this comment.
Batch loop skips every Nth chunk causing data loss
High Severity
When array_idx == 0 && index != 0, the if branch submits the previous batch but the else branch that creates the read op for the current chunk is skipped entirely. This means every chunk at a batch boundary (indices N, 2N, 3N…) is silently dropped. Additionally, n_ops[0] retains the stale entry from the previous batch, causing chunk 0 to be re-read in subsequent batches. When total chunks equals kN+1, last_len ends as 0, so the final if last_len > 0 guard prevents submission of the last chunk too.
Additional Locations (1)
| // Per operation data. | ||
| data: Option<T>, | ||
| // Completed CQEs stored for checking batch completion | ||
| completed: [i32; N], |
There was a problem hiding this comment.
Default batch size bloats every single Op's stack
Medium Severity
DEFAULT_BATCH_SIZE is 255, making every Op<T> (including single operations like Op<Open>, Op<Write>, Op<Read>) carry State<255> (whose largest variant InitializeBatch([squeue::Entry; 255]) is ~16KB) and completed: [i32; 255] (~1KB). This means a simple single file open or write allocates ~17KB on the stack, all of which is unused for non-batch operations.
| } | ||
|
|
||
| Poll::Pending | ||
| } |
There was a problem hiding this comment.
Slab index reuse corrupts unrelated operations between polls
Medium Severity
In PolledBatch, completed entries are removed from the slab via ctx.remove_op(*idx). Between polls (when the lock is released), newly registered operations can reuse those freed slab indices. On the next poll, the batch handler iterates all original ids, finds a reused slot via ctx.ops.get_mut(*idx), and either overwrites the new operation's waker or steals its CQE—silently corrupting an unrelated in-flight operation.
|
PR Review: io-uring batching support. Thanks for this PR! The overall structure is a reasonable approach to batching. However, there are several correctness and safety issues. CRITICAL: (1) Off-by-one in read_batch_size: when index==N the entry is never built, n_ops[0] holds stale data, second batch re-reads offset=0 instead of N*MAX_READ_SIZE. (2) EINTR handling silently drops data: with IO_LINK subsequent entries get -ECANCELED, read_len is short, buf.set_len sets incorrect length (unsound). HIGH: (3) DEFAULT_BATCH_SIZE=255 bloats all Op stack frames with 3+ KiB of unused data. (4) unsafe fn batch() missing Safety doc comment. (5) register_batch panics instead of calling check_and_init. MEDIUM: (6) Stale comment on MAX_READ_SIZE says 64MiB but is 64KiB. (7) n_ops.clone() in hot loop copies 3.2KiB unnecessarily. (8) Batch path wastefully triggered for near-single-chunk files. NITS: typos (completly, Initalization), misleading comment on skip first iteration, unnecessary Arc in Lifecycle::Cancelled, no tests for batching logic. |
PR Review: io-uring batching support (full review)CRITICAL BUG 1: Off-by-one in read_batch_size When index == N, array_idx == 0 so the batch is submitted but the entry for index == N is NEVER BUILT. n_ops[0] retains the stale entry from index == 0 of the first batch. The second batch re-reads at offset 0 instead of N * MAX_READ_SIZE. Every subsequent batch has the same problem. Fix: build the entry before deciding to submit. CRITICAL BUG 2: EINTR in linked batches silently drops data In uring_task, EINTR falls through without updating read_len. With IO_LINK, an interrupted entry causes subsequent entries to be cancelled with -ECANCELED. The code swallows both, leaving read_len short. buf.set_len(buf.len() + read_len) then sets an incorrect length — unsound. There is no retry unlike the single-op path. HIGH: DEFAULT_BATCH_SIZE = 255 bloats all Op stack frames Op now carries completed:[i32;255] (~1 KiB) and State::PolledBatch([usize;255]) (~2 KiB) even for Op/Op. Every io-uring op wastes 3+ KiB of stack it never needs. Use a smaller default or split Op into single/batch types. HIGH: unsafe fn batch() has no Safety doc comment — same buffer-lifetime requirements as Op::new but undocumented. HIGH: register_batch panics instead of falling back — uses assert!(uring_probe.initialized()) while register_op uses check_and_init. Will panic if called before io_uring is set up. MEDIUM: MAX_READ_SIZE comment says 64 MiB but constant is now 64 KiB. MEDIUM: n_ops.clone() in the hot loop copies ~3.2 KiB per batch boundary unnecessarily. MEDIUM: Near-single-chunk files trigger batch path with 198 Nop entries. Nits: completly->completely (read.rs:90); Initalization->Initialization (op.rs:176, 196); skip-first-iteration comment is wrong — runs every Nth iteration; Arc wrapping in Lifecycle::Cancelled adds allocation overhead to single-op cancellations; no tests for the new batching logic. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates batching functionality into Tokio's Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces batching for io-uring read operations, aiming for performance enhancement. However, the implementation of read_batch_size in tokio/src/io/uring/read.rs contains critical logic errors that lead to uninitialized memory exposure and potential out-of-bounds access. Specifically, the batching loop skips operations at certain indices, creating gaps in the buffer that are never written to by the kernel but are included in the final buffer length, leading to an information leak. Additionally, the indexing into the buffer's spare capacity is incorrect when the buffer is not empty. There's also an opportunity to enhance code safety by replacing a manual unsafe operation with a safer standard library equivalent, and a minor issue with a misleading code comment. Addressing these issues is crucial before merging.
| for (index, start) in (0..len).step_by(MAX_READ_SIZE).enumerate() { | ||
| let end = (start + MAX_READ_SIZE).min(len); // clamp to len for the final chunk | ||
| let array_idx = index % N; | ||
|
|
||
| // skip first iteration | ||
| if array_idx == 0 && index != 0 { | ||
| // SAFETY: Batches are valid array entries | ||
| let op = unsafe { Op::batch(n_ops.clone(), Read { fd, buf }) }; | ||
| let (_, r_fd, r_buf) = uring_task(op, &mut read_len).await?; | ||
|
|
||
| fd = r_fd; | ||
| buf = r_buf; | ||
| } else { | ||
| let op = opcode::Read::new( | ||
| types::Fd(fd.as_raw_fd()), | ||
| buf.spare_capacity_mut()[start..].as_mut_ptr().cast(), | ||
| (end - start) as u32, | ||
| ) | ||
| .offset(start as u64) | ||
| .build() | ||
| // link our sqes so cqes arrive in order | ||
| .flags(Flags::IO_LINK); | ||
|
|
||
| n_ops[array_idx] = op; | ||
| } | ||
|
|
||
| last_len = array_idx; // save last array_idx for last batch | ||
| } |
There was a problem hiding this comment.
The batching logic in read_batch_size is flawed, leading to a critical information leak vulnerability. When a full batch is submitted (array_idx == 0 && index != 0), the else branch is skipped, meaning the opcode::Read for that iteration is not created and not added to n_ops. This effectively skips a chunk of the file and creates gaps in the buffer that are never written to by the kernel. Since read_len is updated with the total bytes read by the submitted operations, and buf.set_len is called at the end, the buffer's length will include these uninitialized gaps, exposing potentially sensitive data.
if array_idx == 0 && index != 0 {
// SAFETY: Batches are valid array entries
let op = unsafe { Op::batch(n_ops.clone(), Read { fd, buf }) };
let (_, r_fd, r_buf) = uring_task(op, &mut read_len).await?;
fd = r_fd;
buf = r_buf;
}
let op = opcode::Read::new(
types::Fd(fd.as_raw_fd()),
buf.spare_capacity_mut()[start..].as_mut_ptr().cast(),
(end - start) as u32,
)
.offset(start as u64)
.build()
// link our sqes so cqes arrive in order
.flags(Flags::IO_LINK);
n_ops[array_idx] = op;| } else { | ||
| let op = opcode::Read::new( | ||
| types::Fd(fd.as_raw_fd()), | ||
| buf.spare_capacity_mut()[start..].as_mut_ptr().cast(), |
There was a problem hiding this comment.
The index start is calculated based on the buffer's capacity, but it is used to index into the buffer's spare capacity. If the buffer is not empty, buf.spare_capacity_mut()[start..] will eventually attempt to access an index out of bounds of the spare capacity, leading to a panic. This is because spare_capacity_mut().len() is capacity - len, and start can go up to capacity.
|
|
||
| // Max bytes we can read using io uring submission at a time | ||
| // SAFETY: cannot be higher than u32::MAX for safe cast | ||
| // Set to read max 64 MiB at time |
| n_ops[i].write(opcode::Nop::new().build()); | ||
| } | ||
|
|
||
| let mut n_ops = unsafe { assume_init(n_ops) }; |
There was a problem hiding this comment.
The custom assume_init function uses unsafe transmute_copy. A safer, idiomatic way to initialize the array is by using the safe MaybeUninit::array_assume_init function, which is stable since Rust 1.63. This avoids the need for a custom unsafe helper function.
You can remove the assume_init function entirely with this change.
let mut n_ops = MaybeUninit::array_assume_init(n_ops);

7761: To review by AI