7983: rt: (io-uring) ensure fds are closed when invoking cancel op by martin-augment · Pull Request #80 · martin-augment/tokio

martin-augment · 2026-03-23T07:47:39Z

7983: To review by AI

gemini-code-assist · 2026-03-23T07:47:55Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical resource management issue within the io_uring driver by ensuring that file descriptors opened during an io_uring operation are properly closed if the operation is cancelled. This prevents potential file descriptor leaks, which could lead to system instability or resource exhaustion under heavy load. A new test case has been introduced to validate this fix, enhancing the robustness of the io_uring implementation.

Highlights

File Descriptor Leak Fix: Resolved a file descriptor leak that occurred when an io_uring open operation was cancelled, ensuring that the underlying file descriptor is properly closed.
New Test Case: Added a new test to verify that file descriptors are correctly closed when io_uring open operations are cancelled, preventing resource exhaustion.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-23T07:47:56Z

Walkthrough

The changes modify the io_uring driver's cancellation handling and add a test to verify proper file descriptor cleanup. In the driver code, when a cancelled operation is detected with Lifecycle::Cancelled(CancelData::Open(_)), the code now extracts the completion queue entry's result, constructs an OwnedFd from the returned file descriptor integer, and allows it to be dropped for proper cleanup before removing the operation from the slab. The necessary imports (CancelData, CqeResult, FromRawFd, OwnedFd) are added to support this change. A new test verifies that repeated cancellation of open operations does not leak file descriptors beyond a bounded threshold.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pr-7983-2026-03-23-07-47-36

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request aims to fix a file descriptor leak when an io_uring open operation is cancelled. It does so by checking for cancelled Open operations and explicitly closing the file descriptor if the operation completed successfully before cancellation was processed. A new test is added to verify this behavior.

While the intention is correct, the implementation introduces a regression. The change to specifically handle CancelData::Open in a match arm will cause other cancelled operations (like Read or Write) to hit a panic!, as they will fall into the Some(other) catch-all arm. I've provided a suggestion to fix this by handling all Cancelled operations in a single arm and then checking for the Open variant inside it. I've also included a small stylistic improvement to make the intentional drop of the OwnedFd more explicit.

gemini-code-assist · 2026-03-23T07:49:26Z

tokio/src/runtime/io/driver/uring.rs

+                Some(Lifecycle::Cancelled(CancelData::Open(_))) => {
+                    if let Ok(fd) = CqeResult::from(cqe).result {
+                        // SAFETY: the successful CQE result provides
+                        // a non-negative integer, and the event is
+                        // related to an open operation.
+                        unsafe { OwnedFd::from_raw_fd(fd as i32) };
+                    }
                    // Op future was cancelled, so we discard the result.
-                    // We just remove the entry from the slab.
                    ops.remove(idx);
                }


This change correctly handles closing file descriptors for cancelled open operations, but it introduces a regression. By matching only on Lifecycle::Cancelled(CancelData::Open(_)), any other cancelled operation (e.g., Read, Write) will now fall through to the Some(other) case and cause a panic. The previous implementation correctly handled all Cancelled variants.

To fix this, you should handle all Lifecycle::Cancelled variants in one arm and then check if it's an Open operation inside.

Additionally, using let _ = ... makes it more explicit that you are intentionally creating and dropping the OwnedFd to close the file descriptor.

Some(Lifecycle::Cancelled(cancel_data)) => { if matches!(cancel_data, CancelData::Open(_)) { if let Ok(fd) = CqeResult::from(cqe).result { // SAFETY: the successful CQE result provides // a non-negative integer, and the event is // related to an open operation. let _ = unsafe { OwnedFd::from_raw_fd(fd as i32) }; } } // Op future was cancelled, so we discard the result. ops.remove(idx); }

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! Before the proposed changes any cancelation will lead to removing the index from the list of operations. With the proposed changes only canceled Open operations are handled. Prevents leaking index operations for all but the Open operations.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-23T07:49:45Z

tokio/src/runtime/io/driver/uring.rs

+                        // a non-negative integer, and the event is
+                        // related to an open operation.
+                        unsafe { OwnedFd::from_raw_fd(fd as i32) };
+                    }


Cancelled Read/Write operations cause runtime panic

High Severity

The previous code matched all Cancelled(_) variants in dispatch_completions, gracefully removing them from the slab. The new code only matches Cancelled(CancelData::Open(_)), so Cancelled(CancelData::Read(_)) and Cancelled(CancelData::Write(_)) now fall through to the Some(other) arm which panics. Since Op::drop calls cancel_op for any polled Read or Write operation, this will crash the runtime when the kernel returns a CQE for a cancelled read or write.

Additional Locations (1)

tokio/src/runtime/io/driver/uring.rs#L91-L94

value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! Before the proposed changes any cancelation will lead to removing the index from the list of operations. With the proposed changes only canceled Open operations are handled. Prevents leaking index operations for all but the Open operations.

augmentcode · 2026-03-23T07:51:39Z

🤖 Augment PR Summary

Summary: This PR tightens cancellation handling for io-uring “open” operations to avoid leaking file descriptors when an in-flight open is cancelled.

Changes:

Updates the io-uring completion dispatch to detect cancelled Open ops and close a successful CQE-returned fd immediately via OwnedFd drop.
Refines the cancellation match from a generic cancelled case to a specific CancelData::Open case (to safely interpret CQE results as fds).
Adds a regression test (Linux, io-uring) that repeatedly cancels open operations and checks for fd leakage via /proc/self/fd.

Technical Notes: The test is motivated by tokio issue tokio-rs#7979 and aims to ensure cancelled open CQEs don’t leave unowned fds behind.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-03-23T07:51:40Z

tokio/src/runtime/io/driver/uring.rs

                    *ops.get_mut(idx).unwrap() = Lifecycle::Completed(cqe);
                }
-                Some(Lifecycle::Cancelled(_)) => {
+                Some(Lifecycle::Cancelled(CancelData::Open(_))) => {


CancelData also has Read/Write variants; with this match only handling Cancelled(Open(_)), any cancelled read/write that later completes will now hit the Some(other) arm and panic the runtime. That seems like a functional regression from the previous Cancelled(_) handling (remove-and-discard).

Severity: high

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! Before the proposed changes any cancelation will lead to removing the index from the list of operations. With the proposed changes only canceled Open operations are handled. Prevents leaking index operations for all but the Open operations.

augmentcode · 2026-03-23T07:51:40Z

tokio/tests/fs_uring.rs

+
+                // If io_uring is enabled (and not falling back to the thread pool),
+                // the first poll should return Pending.
+                let _pending = Box::pin(fut).poll_unpin(cx);


This stores the first poll result in _pending but never asserts it is actually Poll::Pending; if the runtime falls back (or the op completes synchronously), the test can pass without exercising the cancelled in-flight io_uring open case it’s meant to cover.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The result of poll_unpin() is just ignored assuming that it is Pending but it is not asserted. Prevents silently passing the test even when the result is not Pending

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tokio/src/runtime/io/driver/uring.rs`:
- Around line 82-88: The code only drops the fd when the slot is already
Lifecycle::Cancelled(CancelData::Open(_)) at CQE handling; you must also perform
the same cleanup when cancel_op removes a slot that is already
Lifecycle::Completed with an Open result. Update the cancel_op (and/or the
branch that handles Lifecycle::Completed(_)) to detect
Completed(CancelData::Open(_)) and, if CqeResult::from(cqe).result is Ok(fd),
call the same unsafe cleanup (OwnedFd::from_raw_fd(fd as i32)) to consume and
drop the fd; reuse the same logic used in the existing
Some(Lifecycle::Cancelled(CancelData::Open(_))) branch so successful open
results cannot leak.
- Around line 82-91: The match arm currently only handles
Some(Lifecycle::Cancelled(CancelData::Open(_))) which lets CancelData::Read and
CancelData::Write fall through and panic; change the arm to handle any
Lifecycle::Cancelled(_) so cancelled read/write ops are discarded instead of
panicking. Specifically, update the match from
Some(Lifecycle::Cancelled(CancelData::Open(_))) to
Some(Lifecycle::Cancelled(cancel_data)) (or use a wildcard
Some(Lifecycle::Cancelled(_))), then inside the block only call
CqeResult::from(cqe).result -> if it’s Ok(fd) and the cancel_data is
CancelData::Open(_) then SAFELY convert to OwnedFd::from_raw_fd(fd as i32); in
all cases call ops.remove(idx) to discard the cancelled operation.

In `@tokio/tests/fs_uring.rs`:
- Around line 154-193: The test currently ignores the result of
Box::pin(fut).poll_unpin(cx) so synchronous Ready cases bypass the intended
Pending path; change the poll site in the spawned task (the poll_fn that calls
OpenOptions::open / the Box::pin(fut).poll_unpin(cx)) to capture the Poll result
and assert it is Pending before sending on tx, ensuring the first poll is
actually Pending; after aborting the handle and awaiting cancellation, poll /
read the fd count in a short loop with a small sleep (or a bounded retry) until
it settles back to the baseline (or a documented tight bound) and only then
compute leaked and assert leaked <= 64.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ad86eeab-261b-4c1b-8497-1d2745719228

📥 Commits

Reviewing files that changed from the base of the PR and between c65333e and 99a243d.

📒 Files selected for processing (2)

tokio/src/runtime/io/driver/uring.rs
tokio/tests/fs_uring.rs

coderabbitai · 2026-03-23T07:57:27Z

tokio/src/runtime/io/driver/uring.rs

+                Some(Lifecycle::Cancelled(CancelData::Open(_))) => {
+                    if let Ok(fd) = CqeResult::from(cqe).result {
+                        // SAFETY: the successful CQE result provides
+                        // a non-negative integer, and the event is
+                        // related to an open operation.
+                        unsafe { OwnedFd::from_raw_fd(fd as i32) };
+                    }


⚠️ Potential issue | 🟠 Major

This still misses the post-completion cancel path.

This only handles the case where the slot is already Cancelled(Open) when the CQE is dispatched. If the CQE lands first, cancel_op later sees Lifecycle::Completed(_) and removes the slot without consuming a successful open result, so the fd can still leak. Please reuse the same cleanup when discarding a completed Open CQE.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tokio/src/runtime/io/driver/uring.rs` around lines 82 - 88, The code only drops the fd when the slot is already Lifecycle::Cancelled(CancelData::Open(_)) at CQE handling; you must also perform the same cleanup when cancel_op removes a slot that is already Lifecycle::Completed with an Open result. Update the cancel_op (and/or the branch that handles Lifecycle::Completed(_)) to detect Completed(CancelData::Open(_)) and, if CqeResult::from(cqe).result is Ok(fd), call the same unsafe cleanup (OwnedFd::from_raw_fd(fd as i32)) to consume and drop the fd; reuse the same logic used in the existing Some(Lifecycle::Cancelled(CancelData::Open(_))) branch so successful open results cannot leak.

coderabbitai · 2026-03-23T07:57:27Z

tokio/src/runtime/io/driver/uring.rs

+                Some(Lifecycle::Cancelled(CancelData::Open(_))) => {
+                    if let Ok(fd) = CqeResult::from(cqe).result {
+                        // SAFETY: the successful CQE result provides
+                        // a non-negative integer, and the event is
+                        // related to an open operation.
+                        unsafe { OwnedFd::from_raw_fd(fd as i32) };
+                    }
                    // Op future was cancelled, so we discard the result.
-                    // We just remove the entry from the slab.
                    ops.remove(idx);
                }


⚠️ Potential issue | 🔴 Critical

Don't regress cancelled read/write ops into a panic.

tokio/src/runtime/driver/op.rs still defines CancelData::Read and CancelData::Write, so narrowing this arm to Cancelled(Open) makes those completions fall through to Some(other) and panic instead of being discarded.

Suggested fix

- Some(Lifecycle::Cancelled(CancelData::Open(_))) => { - if let Ok(fd) = CqeResult::from(cqe).result { - // SAFETY: the successful CQE result provides - // a non-negative integer, and the event is - // related to an open operation. - unsafe { OwnedFd::from_raw_fd(fd as i32) }; - } + Some(Lifecycle::Cancelled(cancel_data)) => { + let is_open = matches!(cancel_data, CancelData::Open(_)); + if is_open { + if let Ok(fd) = CqeResult::from(cqe).result { + // SAFETY: the successful CQE result provides + // a non-negative integer, and the event is + // related to an open operation. + drop(unsafe { OwnedFd::from_raw_fd(fd as i32) }); + } + } // Op future was cancelled, so we discard the result. ops.remove(idx); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tokio/src/runtime/io/driver/uring.rs` around lines 82 - 91, The match arm currently only handles Some(Lifecycle::Cancelled(CancelData::Open(_))) which lets CancelData::Read and CancelData::Write fall through and panic; change the arm to handle any Lifecycle::Cancelled(_) so cancelled read/write ops are discarded instead of panicking. Specifically, update the match from Some(Lifecycle::Cancelled(CancelData::Open(_))) to Some(Lifecycle::Cancelled(cancel_data)) (or use a wildcard Some(Lifecycle::Cancelled(_))), then inside the block only call CqeResult::from(cqe).result -> if it’s Ok(fd) and the cancel_data is CancelData::Open(_) then SAFELY convert to OwnedFd::from_raw_fd(fd as i32); in all cases call ops.remove(idx) to discard the cancelled operation.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! Before the proposed changes any cancelation will lead to removing the index from the list of operations. With the proposed changes only canceled Open operations are handled. Prevents leaking index operations for all but the Open operations.

coderabbitai · 2026-03-23T07:57:27Z

tokio/tests/fs_uring.rs

+    let fd_count_before_access = fs::read_dir("/proc/self/fd").unwrap().count();
+
+    for _ in 0..128 {
+        let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel();
+
+        let path = path.clone();
+        let handle = tokio::spawn(async move {
+            poll_fn(|cx| {
+                let opt = {
+                    let mut opt = tokio::fs::OpenOptions::new();
+                    opt.read(true);
+                    opt
+                };
+
+                let fut = opt.open(&path[0]);
+
+                // If io_uring is enabled (and not falling back to the thread pool),
+                // the first poll should return Pending.
+                let _pending = Box::pin(fut).poll_unpin(cx);
+
+                tx.send(()).unwrap();
+
+                Poll::<()>::Pending
+            })
+            .await;
+        });
+
+        // Wait for the first poll
+        rx.recv().await.unwrap();
+
+        handle.abort();
+
+        let res = handle.await.unwrap_err();
+        assert!(res.is_cancelled());
+    }
+
+    let fd_count_after_cancel = fs::read_dir("/proc/self/fd").unwrap().count();
+    let leaked = fd_count_after_cancel.saturating_sub(fd_count_before_access);
+
+    assert!(leaked <= 64);


⚠️ Potential issue | 🟠 Major

This regression can pass without ever exercising the bug.

poll_unpin is ignored, so a synchronous Ready (fallback / non-io_uring path) turns the loop into a no-op, and the final leaked <= 64 check runs before completions have necessarily drained. That both weakens coverage and makes the bound noisy. Assert that the first poll is Pending, then wait for the fd count to settle back to the baseline (or a tight, explained bound) before asserting.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tokio/tests/fs_uring.rs` around lines 154 - 193, The test currently ignores the result of Box::pin(fut).poll_unpin(cx) so synchronous Ready cases bypass the intended Pending path; change the poll site in the spawned task (the poll_fn that calls OpenOptions::open / the Box::pin(fut).poll_unpin(cx)) to capture the Poll result and assert it is Pending before sending on tx, ensuring the first poll is actually Pending; after aborting the handle and awaiting cancellation, poll / read the fd count in a short loop with a small sleep (or a bounded retry) until it settles back to the baseline (or a documented tight bound) and only then compute leaked and assert leaked <= 64.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The result of poll_unpin() is just ignored assuming that it is Pending but it is not asserted. Prevents silently passing the test even when the result is not Pending

mattiapitossi added 9 commits March 21, 2026 13:18

fs: ensure fds are closed when invoking cancel op

96188ed

fix name

970f3c7

rustfmt

166374f

add SAFETY comment

1d04d3f

improve safety comment

a3f25f3

remove settings

08aa7bc

provide better naming for fd test

555c9ec

remove sleep in test

2b92733

use OwedFd's drop instead of libc close

99a243d

gemini-code-assist bot reviewed Mar 23, 2026

View reviewed changes

cursor bot reviewed Mar 23, 2026

View reviewed changes

augmentcode bot reviewed Mar 23, 2026

View reviewed changes

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

Conversation

martin-augment commented Mar 23, 2026

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 23, 2026

Choose a reason for hiding this comment

Cancelled Read/Write operations cause runtime panic

Uh oh!

martin-augment Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Mar 23, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 23, 2026 •

edited

Loading