fix(runtime/smol): eliminate lost-wakeup race and mutex-poison panic#790
Closed
nightness wants to merge 11 commits intowebrtc-rs:masterfrom
Closed
fix(runtime/smol): eliminate lost-wakeup race and mutex-poison panic#790nightness wants to merge 11 commits intowebrtc-rs:masterfrom
nightness wants to merge 11 commits intowebrtc-rs:masterfrom
Conversation
Without a timeout, `resolve_host` in `gather_from_stun_server` could block indefinitely on a misconfigured or unreachable DNS server, hanging ICE gathering. STUN response timeouts are already handled by the Sans-IO StunClient retry loop (7 attempts × 300ms RTO), so no separate fix is needed for webrtc-rs#778. Closes webrtc-rs#774 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…#784) Adds tests/datachannel_video_interop.rs verifying that an RTCPeerConnection with both a video transceiver and a data channel negotiates successfully: - Offer SDP contains both m=video (active, with codecs) and m=application - Answer SDP contains both m-lines - ICE + DTLS + SCTP establish and data channel message is exchanged Key finding: the feature works correctly in master when default codecs are registered on both sides. Users must call register_default_codecs() on the MediaEngine for video m-lines to have non-rejected codecs in the offer. Closes webrtc-rs#784 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Shows that a DataChannel and a video transceiver coexist on the same PeerConnection. The key callout — prominently documented in the module doc — is that MediaEngine::register_default_codecs() must be called; without it the offer emits m=video 0 (rejected), which misleads users into thinking mixed DC+video is broken. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Re-export MulticastDnsMode from src/peer_connection/mod.rs - Add PeerConnectionBuilder::with_mdns_mode() so callers explicitly enable the multicast socket alongside with_setting_engine(); no submodule changes required - When mode != Disabled, PeerConnectionImpl::new() creates a multicast UDP socket via rtc_mdns::MulticastSocket and adds it to the driver's socket map; existing non-ICE routing forwards mDNS packets to the core - Add examples/mdns-local-peers/ demonstrating two in-process peers using QueryAndGather mode without a STUN server Closes webrtc-rs#782 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-rs#781) - Add AsyncTcpListener + AsyncTcpStream traits to Runtime abstraction; implement for both Tokio and smol runtimes - Add tokio io-util feature for AsyncReadExt/AsyncWriteExt - Add rtc-shared direct dependency for tcp_framing utilities - PeerConnectionImpl::new() binds TCP passive listeners from with_tcp_addrs() and passes them to the driver - RTCIceGatherer emits tcptype passive host candidates for each TCP listener alongside existing UDP host candidates - PeerConnectionDriver: - Spawns an accept loop per TCP listener; registers accepted connections - Per connection: dedicated read task (RFC 4571 decode → tcp_inbound_rx) and write task (tcp_write_txs channel → RFC 4571 encode) - select! loop handles tcp_new_conn_rx and tcp_inbound_rx alongside UDP - handle_write() routes TCP packets to write channels; dials out via runtime.connect_tcp() on first packet to an unknown peer (active mode) - Add examples/ice-tcp/ demonstrating two in-process peers over TCP only Closes webrtc-rs#781 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SmolJoinHandle: replace .lock().unwrap() with .lock() .unwrap_or_else(|e| e.into_inner()) in both detach() and abort(). Recovering from a poisoned mutex is safe in cleanup paths; panicking would cause a double-panic and abort the process. SmolNotify: rewrite to eliminate the lost-wakeup race introduced by using try_lock() in notify_one() / notify_waiters(). Previous design: try_lock() would silently discard a notification if the lock was held by a concurrent notified() call, permanently hanging any task that was registering itself as a waiter at that instant. New design uses an AtomicBool "pending" flag: - notify_*: sets the flag (Release) BEFORE attempting to acquire the waiter list, so any concurrent notified() will see the flag even if the lock is still held. - notified(): checks the flag (AcqRel swap) before AND after acquiring the lock, so it cannot miss a notification that arrived between the two checks. Also eliminates the duplicated AsyncNotify impl body by delegating to the SmolNotify inherent methods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
|
Superseded by #795 which consolidates all driver robustness fixes into a single PR. |
3 tasks
Author
|
Superseded by #795 which is a properly scoped PR for these driver/runtime fixes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two correctness bugs in the smol runtime backend:
1.
SmolJoinHandle— mutex poison panicdetach()andabort()used.lock().unwrap()on astd::sync::Mutex. If another thread panicked while holding the lock, this would trigger a double-panic and abort the process. Fixed with.unwrap_or_else(|e| e.into_inner())which recovers the data even from a poisoned mutex.2.
SmolNotify— lost-wakeup race (permanent task hang)The previous implementation used
try_lock()innotify_one()/notify_waiters(). If the lock was held by a concurrentnotified()call registering itself as a waiter, the notification was silently discarded — causing any task waiting onnotified()to hang permanently.New design adds an
AtomicBool"pending" flag:notify_*sets the flag (Release) before attempting to acquire the waiter list — so even if the lock is contended, a concurrentnotified()will see the flag.notified()checks the flag (AcqRelswap) before and after acquiring the lock, so it cannot miss a notification that arrived between the two checks.Also eliminates the duplicated
AsyncNotifyimpl body by delegating to theSmolNotifyinherent methods.Test Plan
cargo build(tokio, default) passescargo build --no-default-features --features runtime-smolpasses🤖 Generated with Claude Code