Skip to content

Conversation

@artrixdotdev
Copy link
Owner

@artrixdotdev artrixdotdev commented Oct 12, 2025

Fixes tons of panics related to PeerStream and paves the way for a cancel system in the future.

Summary by CodeRabbit

  • New Features

    • Added cancellation-aware send/receive for peer messages, enabling prompt abort of in-flight operations.
  • Bug Fixes

    • Prevented a crash when parsing certain Extended messages by returning a handled error instead.
    • Improved shutdown behavior: peers now stop cleanly on unexpected disconnects (e.g., EOF or connection reset).
    • Enhanced error reporting by preserving underlying I/O errors, reducing misleading failure messages.
  • Reliability

    • Centralized message handling and refined error paths for more predictable peer behavior under failure and cancellation scenarios.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 12, 2025

Walkthrough

Introduces cancellation-aware send/receive in the protocol stream, adds tokio-util dependency, extends PeerActor error types (new MessageCancelled; ReceiveFailed now carries std::io::Error), centralizes actor receive-to-signal handling, and replaces an unwrap in Extended message parsing with error propagation.

Changes

Cohort / File(s) Summary
Dependency update
crates/libtortillas/Cargo.toml
Added dependency: tokio-util (^0.7).
Error enum changes
crates/libtortillas/src/errors.rs
PeerActorError: added MessageCancelled; ReceiveFailed payload changed from String to std::io::Error.
Cancellation-aware stream ops
crates/libtortillas/src/protocol/stream.rs
Added send_with_cancel and recv_with_cancel using CancellationToken; mapped read errors to underlying io::Error; ReceiveFailed now preserves io::Error; logging via inspect_err.
Actor recv handling
crates/libtortillas/src/peer/actor.rs
Added check_message_signal helper; next now uses it to convert recv results to Signal and stop on UnexpectedEof/ConnectionReset; added instrumentation to next.
Extended message parse handling
crates/libtortillas/src/protocol/messages.rs
Replaced unwrap with ?; on parse failure returns PeerActorError::MessageParsingFailed instead of panicking.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor C as Caller
  participant PS as PeerStream (PeerSend/PeerRecv)
  participant CT as CancellationToken
  participant NET as Network

  rect rgba(230,245,255,0.5)
  note over C,PS: send_with_cancel
  C->>PS: send_with_cancel(data, CT)
  alt CT cancelled first
    CT-->>PS: cancelled()
    PS-->>C: Err(MessageCancelled)
  else not cancelled
    PS->>NET: send(data)
    NET-->>PS: Result
    PS-->>C: Ok/Err
  end
  end

  rect rgba(230,255,230,0.5)
  note over C,PS: recv_with_cancel
  C->>PS: recv_with_cancel(CT)
  alt CT cancelled first
    CT-->>PS: cancelled()
    PS-->>C: Err(MessageCancelled)
  else not cancelled
    PS->>NET: recv()
    NET-->>PS: PeerMessages or io::Error
    PS-->>C: Ok(msg) / Err(ReceiveFailed(io::Error))
  end
  end
Loading
sequenceDiagram
  autonumber
  participant A as PeerActor
  participant S as Stream
  participant H as check_message_signal
  participant L as Logger

  A->>S: recv()
  S-->>A: Result<PeerMessages, PeerActorError>
  A->>H: check_message_signal(actor_ref, result)
  alt Ok(msg)
    H-->>A: Some(Signal::Message(msg, actor_ref))
  else Err(ReceiveFailed(io::Error: EOF/ConnReset))
    H->>L: log error
    H-->>A: Some(Signal::Stop)
  else Err(other)
    H->>L: log error
    H-->>A: None
  end
  A->>A: proceed with Signal if Some
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly captures the two primary changes—eliminating potential panic points in stream handling and introducing groundwork for cancellation support—using clear, conventional phrasing without extraneous details.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/stream-eof-panics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot added the med prio Medium Priority label Oct 12, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/libtortillas/src/protocol/messages.rs (1)

400-400: Remove remaining panic point at line 400.

Line 400 still uses .unwrap() on serde_bencode::from_bytes, which contradicts the PR's goal of removing panic points. This creates an inconsistency with the identical parsing scenario at lines 378-381 that was properly fixed.

Apply this diff to handle the parsing error consistently:

-               let extended_message: ExtendedMessage =
-                  serde_bencode::from_bytes(&extended_message_bytes).unwrap();
+               let extended_message: ExtendedMessage =
+                  serde_bencode::from_bytes(&extended_message_bytes)
+                     .map_err(|e| PeerActorError::MessageParsingFailed {
+                        reason: e.to_string(),
+                     })?;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e23f7b3 and ec385ab.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • crates/libtortillas/Cargo.toml (1 hunks)
  • crates/libtortillas/src/errors.rs (2 hunks)
  • crates/libtortillas/src/peer/actor.rs (3 hunks)
  • crates/libtortillas/src/protocol/messages.rs (1 hunks)
  • crates/libtortillas/src/protocol/stream.rs (6 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-18T18:25:24.831Z
Learnt from: kurealnum
PR: artrixdotdev/tortillas#99
File: crates/libtortillas/src/engine/mod.rs:153-155
Timestamp: 2025-08-18T18:25:24.831Z
Learning: The peer stream handling logic in the Engine's EngineMessage::IncomingPeer handler in crates/libtortillas/src/engine/mod.rs is temporary and planned to be migrated to separate code in the future, so extensive refactoring may not be worthwhile in this location.

Applied to files:

  • crates/libtortillas/src/protocol/messages.rs
🧬 Code graph analysis (1)
crates/libtortillas/src/peer/actor.rs (1)
crates/libtortillas/src/protocol/stream.rs (1)
  • recv (70-129)
🔇 Additional comments (10)
crates/libtortillas/Cargo.toml (1)

39-39: LGTM!

The addition of tokio-util dependency correctly supports the new cancellation-aware APIs introduced in the stream traits.

crates/libtortillas/src/protocol/messages.rs (1)

378-381: LGTM!

The change from .unwrap() to proper error handling with map_err correctly removes a panic point and aligns with the PR's objective of eliminating possible panic points in stream handling.

crates/libtortillas/src/errors.rs (2)

103-105: LGTM!

The addition of MessageCancelled variant correctly supports the new cancellation-aware send/recv methods in the stream traits.


121-121: LGTM!

Changing ReceiveFailed to carry std::io::Error instead of String preserves the original error details and enables better error handling downstream, particularly in the check_message_signal helper where error kind is inspected.

crates/libtortillas/src/peer/actor.rs (3)

273-273: LGTM!

The instrument attribute improves observability by adding tracing context with the peer's address and ID, while appropriately skipping large objects like self, actor_ref, and mailbox_rx.


309-309: LGTM!

Delegating to check_message_signal centralizes error handling and signal generation, improving maintainability.


65-90: Verify error handling for non-IO error cases.

The check_message_signal helper returns None for most error variants (except UnexpectedEof and ConnectionReset), which means these errors are logged but don't generate any control signals. This behavior may silently ignore critical errors like MessageParsingFailed or MessageCancelled without terminating the actor or taking corrective action.

Please confirm this is the intended behavior, or consider whether other error variants should also map to Signal::Stop or require different handling.

crates/libtortillas/src/protocol/stream.rs (3)

48-63: LGTM!

The send_with_cancel method correctly implements cancellation support using tokio::select! to race the cancellation token against the send operation, returning MessageCancelled when cancelled.


130-144: LGTM!

The recv_with_cancel method correctly implements cancellation support using tokio::select! to race the cancellation token against the receive operation, returning MessageCancelled when cancelled.


77-77: LGTM!

The error handling updates correctly preserve the original io::Error instead of converting to String, enabling better error kind inspection in check_message_signal. The addition of inspect_err at lines 110-116 improves observability by logging the error before mapping.

Also applies to: 97-97, 110-116

artrixdotdev and others added 2 commits October 12, 2025 11:12
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@coderabbitai coderabbitai bot added the bug Something isn't working label Oct 12, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/libtortillas/src/protocol/stream.rs (1)

48-49: Fix typo in documentation.

The doc comment has a typo: "cancel" should be "cancelled".

This issue was already flagged in a previous review.

🧹 Nitpick comments (1)
crates/libtortillas/src/protocol/messages.rs (1)

199-199: Consider handling serialization errors.

While serialization of our own structures is generally safer than parsing untrusted input, line 199 still contains an unwrap() that could panic if serde_bencode::to_bytes() fails:

payload.extend_from_slice(&serde_bencode::to_bytes(inner).unwrap());

Consider propagating this error as well for completeness:

-            if let Some(inner) = &**handshake_message {
-               payload.extend_from_slice(&serde_bencode::to_bytes(inner).unwrap());
-            }
+            if let Some(inner) = &**handshake_message {
+               let encoded = serde_bencode::to_bytes(inner)
+                  .map_err(|e| PeerActorError::MessageParsingFailed {
+                     reason: format!("Failed to encode extended message: {}", e),
+                  })?;
+               payload.extend_from_slice(&encoded);
+            }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec385ab and 9644bb4.

📒 Files selected for processing (2)
  • crates/libtortillas/src/protocol/messages.rs (2 hunks)
  • crates/libtortillas/src/protocol/stream.rs (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
crates/libtortillas/src/protocol/messages.rs (1)
crates/libtortillas/src/tracker/udp.rs (1)
  • from_bytes (211-343)
🔇 Additional comments (3)
crates/libtortillas/src/protocol/messages.rs (1)

378-381: Excellent panic prevention!

Replacing unwrap() with proper error propagation eliminates potential crash points when parsing Extended messages from untrusted peers. This aligns perfectly with the PR objective to remove panic points.

Also applies to: 400-404

crates/libtortillas/src/protocol/stream.rs (2)

77-77: Improved error context preservation.

Changing from PeerActorError::ReceiveFailed(e.to_string()) to PeerActorError::ReceiveFailed(e) preserves the underlying io::Error, which provides better error context and allows downstream code to inspect error kinds. The addition of .inspect_err for logging at lines 113-115 is also a nice touch for observability.

Also applies to: 97-97, 110-116


48-63: Well-implemented cancellation support.

The cancellation logic using tokio::select! is correct and idiomatic. Both send_with_cancel and recv_with_cancel:

  • Check for cancellation first via token.cancelled()
  • Return PeerActorError::MessageCancelled when cancelled
  • Delegate to existing logic otherwise

This provides a clean foundation for the future cancel system mentioned in the PR objectives.

Also applies to: 133-145

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
crates/libtortillas/src/protocol/stream.rs (1)

48-49: Fix documentation typos.

The documentation still contains grammatical errors:

  • Line 48: "a cancellation support" should be "cancellation support" (remove "a")
  • Line 49: "cancel" should be "cancelled"

Apply this diff:

-   /// Sends a message to a peer with a cancellation support, returning an
-   /// error if the operation is cancel
+   /// Sends a message to a peer with cancellation support, returning an
+   /// error if the operation is cancelled
🧹 Nitpick comments (2)
crates/libtortillas/src/protocol/stream.rs (2)

50-63: Consider using biased select for deterministic cancellation behavior.

The implementation is correct and follows standard tokio patterns for cancellation. However, tokio::select! uses fair/random selection when multiple branches are ready. Consider using biased; to check cancellation first, ensuring predictable behavior if both the send operation and cancellation complete simultaneously.

Apply this diff to make the select biased:

    async fn send_with_cancel(
       &mut self, data: PeerMessages, token: CancellationToken,
    ) -> Result<(), PeerActorError> {
       tokio::select! {
+         biased;
          _ = token.cancelled() => {
             trace!("Sending message to peer was cancelled");
             return Err(PeerActorError::MessageCancelled);
-
          },
          result = self.send(data) => {
             result
          }
       }
    }

Note: Also removed the extra blank line before the closing brace in the cancellation branch for cleaner formatting.


130-144: LGTM: recv_with_cancel implementation.

The implementation mirrors send_with_cancel and provides consistent cancellation semantics. The documentation is correct here (unlike send_with_cancel).

Consider applying the same biased select pattern mentioned for send_with_cancel:

    async fn recv_with_cancel(
       &mut self, token: CancellationToken,
    ) -> Result<PeerMessages, PeerActorError> {
       tokio::select! {
+         biased;
          _ = token.cancelled() => {
             trace!("Receiving message from peer was cancelled");
             return Err(PeerActorError::MessageCancelled);
-
          },
          result = self.recv() => {
             result
          }
       }
    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9644bb4 and 2cfcab1.

📒 Files selected for processing (1)
  • crates/libtortillas/src/protocol/stream.rs (6 hunks)
🔇 Additional comments (2)
crates/libtortillas/src/protocol/stream.rs (2)

18-18: LGTM: CancellationToken dependency added.

The addition of tokio_util::sync::CancellationToken is appropriate for implementing cancellation support. According to the crate documentation, CancellationToken is stable and well-suited for this use case.

Based on learnings.


75-78: Excellent error handling improvement.

Preserving the original std::io::Error instead of converting to String provides better error diagnostics and allows upstream code to inspect error kinds and sources. The addition of inspect_err at lines 113-115 provides helpful logging while maintaining the original error.

Also applies to: 95-98, 110-116

@artrixdotdev artrixdotdev merged commit 641f7d6 into main Oct 12, 2025
3 checks passed
@artrixdotdev artrixdotdev deleted the fix/stream-eof-panics branch October 12, 2025 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working med prio Medium Priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants