Skip to content

fix(driver): distinguish transient vs fatal UDP socket recv errors (closes #777)#793

Closed
nightness wants to merge 15 commits intowebrtc-rs:masterfrom
Brainwires:fix/driver-socket-errors
Closed

fix(driver): distinguish transient vs fatal UDP socket recv errors (closes #777)#793
nightness wants to merge 15 commits intowebrtc-rs:masterfrom
Brainwires:fix/driver-socket-errors

Conversation

@nightness
Copy link
Copy Markdown

Summary

Resolves the //TODO: better handling on socket recv error #777 at driver.rs.

Problem: Any socket recv_from error immediately terminated the peer connection driver. Transient OS errors (EAGAIN/EWOULDBLOCK/EINTR) occur on non-blocking sockets and should be retried, not treated as fatal.

Fix:

  • create_socket_recv_future now returns (result, local_addr, idx, buf) instead of Result<...> so the error arm always has idx + buf available to re-queue the future
  • WouldBlock / Interruptedtrace! log + re-queue the recv future
  • All other errors → error! log + abort TCP tasks + return error (existing behaviour)

Test plan

  • cargo build passes
  • Manual: a peer connection survives a burst of transient EAGAIN errors on its UDP socket and continues operating normally

🤖 Generated with Claude Code

nightness and others added 15 commits April 1, 2026 04:15
Without a timeout, `resolve_host` in `gather_from_stun_server` could block
indefinitely on a misconfigured or unreachable DNS server, hanging ICE
gathering. STUN response timeouts are already handled by the Sans-IO
StunClient retry loop (7 attempts × 300ms RTO), so no separate fix is
needed for webrtc-rs#778.

Closes webrtc-rs#774

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…#784)

Adds tests/datachannel_video_interop.rs verifying that an RTCPeerConnection
with both a video transceiver and a data channel negotiates successfully:
  - Offer SDP contains both m=video (active, with codecs) and m=application
  - Answer SDP contains both m-lines
  - ICE + DTLS + SCTP establish and data channel message is exchanged

Key finding: the feature works correctly in master when default codecs are
registered on both sides. Users must call register_default_codecs() on the
MediaEngine for video m-lines to have non-rejected codecs in the offer.

Closes webrtc-rs#784

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Shows that a DataChannel and a video transceiver coexist on the same
PeerConnection. The key callout — prominently documented in the module
doc — is that MediaEngine::register_default_codecs() must be called;
without it the offer emits m=video 0 (rejected), which misleads users
into thinking mixed DC+video is broken.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Re-export MulticastDnsMode from src/peer_connection/mod.rs
- Add PeerConnectionBuilder::with_mdns_mode() so callers explicitly
  enable the multicast socket alongside with_setting_engine(); no
  submodule changes required
- When mode != Disabled, PeerConnectionImpl::new() creates a multicast
  UDP socket via rtc_mdns::MulticastSocket and adds it to the driver's
  socket map; existing non-ICE routing forwards mDNS packets to the core
- Add examples/mdns-local-peers/ demonstrating two in-process peers
  using QueryAndGather mode without a STUN server

Closes webrtc-rs#782

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-rs#781)

- Add AsyncTcpListener + AsyncTcpStream traits to Runtime abstraction;
  implement for both Tokio and smol runtimes
- Add tokio io-util feature for AsyncReadExt/AsyncWriteExt
- Add rtc-shared direct dependency for tcp_framing utilities
- PeerConnectionImpl::new() binds TCP passive listeners from with_tcp_addrs()
  and passes them to the driver
- RTCIceGatherer emits tcptype passive host candidates for each TCP listener
  alongside existing UDP host candidates
- PeerConnectionDriver:
  - Spawns an accept loop per TCP listener; registers accepted connections
  - Per connection: dedicated read task (RFC 4571 decode → tcp_inbound_rx)
    and write task (tcp_write_txs channel → RFC 4571 encode)
  - select! loop handles tcp_new_conn_rx and tcp_inbound_rx alongside UDP
  - handle_write() routes TCP packets to write channels; dials out via
    runtime.connect_tcp() on first packet to an unknown peer (active mode)
- Add examples/ice-tcp/ demonstrating two in-process peers over TCP only

Closes webrtc-rs#781

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SmolJoinHandle: replace .lock().unwrap() with .lock()
.unwrap_or_else(|e| e.into_inner()) in both detach() and abort().
Recovering from a poisoned mutex is safe in cleanup paths; panicking
would cause a double-panic and abort the process.

SmolNotify: rewrite to eliminate the lost-wakeup race introduced by
using try_lock() in notify_one() / notify_waiters().

Previous design: try_lock() would silently discard a notification if
the lock was held by a concurrent notified() call, permanently hanging
any task that was registering itself as a waiter at that instant.

New design uses an AtomicBool "pending" flag:
- notify_*: sets the flag (Release) BEFORE attempting to acquire the
  waiter list, so any concurrent notified() will see the flag even if
  the lock is still held.
- notified(): checks the flag (AcqRel swap) before AND after acquiring
  the lock, so it cannot miss a notification that arrived between the
  two checks.

Also eliminates the duplicated AsyncNotify impl body by delegating to
the SmolNotify inherent methods.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cle logging

Previously, TCP accept loop tasks and per-connection read/write tasks
were spawned with `runtime.spawn()` and the returned JoinHandle was
immediately dropped (detached).  When the driver shut down, these tasks
kept running indefinitely, accumulating file descriptors and memory.

Changes:
- Add `tcp_task_handles: Vec<JoinHandle>` to PeerConnectionDriver.
- Store handles for all TCP accept loop, read, and write tasks.
- Add `abort_tcp_tasks()` helper that calls `.abort()` on each handle.
- Call `abort_tcp_tasks()` on all driver exit paths (clean close,
  socket error, all-futures-completed).
- Log a warning when a TCP write channel is full/closed so the
  connection deregistration is visible in logs.
- Add `log::debug!` at driver event loop start and clean shutdown.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `with_mdns_fail_on_socket_error(bool)` to `PeerConnectionBuilder`.
When set to `true`, a failure to create the mDNS multicast socket is
returned as an error from `build()` instead of being silently demoted
to a warning. Defaults to `false` (existing behaviour unchanged).

This lets production deployments that depend on mDNS detect
misconfiguration at startup rather than silently operating without it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- fix/smol-runtime-correctness: SmolJoinHandle poison recovery + SmolNotify lost-wakeup race fix
- fix/tcp-task-lifecycle: track/abort TCP accept/read/write task handles on shutdown
- fix/driver-error-handling: configurable mDNS socket-creation error handling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ebrtc-rs#777)

Previously, any socket recv error terminated the driver immediately.
Transient errors (WouldBlock / Interrupted) occur spuriously on
non-blocking sockets and should be retried, not treated as fatal.

Changes:
- create_socket_recv_future now returns (result, local_addr, idx, buf)
  instead of Result<...> so the error arm always has idx + buf available
  to re-queue the future.
- WouldBlock / Interrupted: log at trace level and re-queue the recv
  future so the socket continues to be polled.
- All other errors: fatal — log at error level, abort TCP tasks, return.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nightness
Copy link
Copy Markdown
Author

Superseded by #795 which consolidates all driver robustness fixes into a single PR.

@nightness
Copy link
Copy Markdown
Author

Superseded by #795 which is a properly scoped PR for these driver/runtime fixes.

@nightness nightness closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant