Multinet: improve link auto‑recovery (TCP keepalive, SO_ERROR check, backoff reset)#4
Open
mkostersitz wants to merge 1 commit intopkoning2:mainfrom
Open
Conversation
… and accepted sockets (Windows/Linux/macOS)\n- Verify non-blocking connect success via SO_ERROR before marking connected\n- Reset connect backoff timer on successful connect/bind to avoid long delays\n- Minor: fix factory error logging to use name before instance exists\n\nHelps links recover without router restart when a peer or path drops.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add TCP keepalive on outbound and accepted sockets (Windows/Linux/macOS) to detect half‑open peers and trigger reconnect.
Verify non‑blocking TCP connect success using SO_ERROR after POLLOUT to avoid false positives that stall.
Reset connection backoff on successful connect/bind to avoid long post‑outage delays.
Minor: fix factory error logging to use name before instance exists.
Rationale In field use, when a tunnel path or peer dies, links sometimes don’t recover until router restart. OS keepalives and robust connect result checks ensure dead connections are detected; the existing state machine’s reconnect path is then exercised. Resetting backoff on success avoids prolonged recovery delays after a return to health.
Implementation notes
decnet/host.py: add set_tcp_keepalive with platform‑specific tuning; apply in create_connection.
decnet/multinet.py: Connect mode: SO_ERROR verification in check_connection; conntmr.reset() on success.
Listen mode: conntmr.reset() after successful bind; enable keepalive on accepted sockets.
Tests
Light regression via existing Multinet tests (connect/reconnect/accept paths). No public API changes.
Let me know what you think :)