Skip to content

Comments

fix(stream): avoid removing peer when other connections remain#538

Merged
marcus-pousette merged 3 commits intodao-xyz:masterfrom
Faolain:fix/directstream-partial-disconnect
Jan 16, 2026
Merged

fix(stream): avoid removing peer when other connections remain#538
marcus-pousette merged 3 commits intodao-xyz:masterfrom
Faolain:fix/directstream-partial-disconnect

Conversation

@Faolain
Copy link
Contributor

@Faolain Faolain commented Jan 4, 2026

I was having massive amounts of trouble getting non-flaky (consistent) WebRTC only connections browser-to-browser with Peerbit. I noticed that about 20-50% of the time replication would work flawlessly (aka I would connect to Peer A via Peer B Browser and see the data reflected in Peer B. If I added/modified data in Peer A it would show up immediately in Peer B). However most of the other times it would initially load the data and then...never load any updates again.

I created an anchoring test and let an AI Agent add instrumentation to every single connection and log within the stack to identify where the data was being dropped and why replication happened sometimes and not others and it arrived on the solution below which now makes all my tests pass and my app now performs flawlessly.

tl;dr
What fixed the flake (core change)

  • Root cause: @peerbit/stream’s DirectStream.onPeerDisconnected() was removing a peer when any single connection closed. When peers had multiple connections (relay + WebRTC), closing one connection caused the peer to be removed entirely even though another connection was still alive. That cascaded into:
    • pubsub unsubscribe (reason: peer-unreachable)
    • shared‑log removing the replicator
    • replication freezing while UI still showed “connected”
  • Fix: In patches/@peerbit__stream@4.5.2.patch, we changed onPeerDisconnected() to skip peer removal if any other connection remains. Only remove when the disconnect event corresponds to the last active connection.
  • Why it works: The replication stack assumes “peer presence = eligible replicator.” By preventing false peer removal, we keep subscriptions alive and replication continues.

Summary

  • Avoid removing a peer when other connections to that peer are still active.
  • Prevents pubsub unsubscribe (peer-unreachable) and replication drop when a single connection closes.

Problem

When a peer has multiple connections (e.g. relay + direct), a single connection close can trigger DirectStream peer removal. This cascades into pubsub unsubscribe and stops replication even though another connection is still alive.

Fix

If the connection manager still has other connections for the peer (or the disconnect event cannot be matched to a tracked connection), skip removing the peer. Only remove when there is a single tracked connection and it matches the disconnect.

Notes

  • This aligns with libp2p issue #2369 where disconnect events can be emitted without a connection id.
  • Verified downstream in a browser app where live updates were flaky; this change stabilizes replication.

Testing

  • Not run in this repo. (Validated in downstream app via E2E: library live-update spec and pubsub echo probe.)

Edit: I am happy to share the code I've been working on to demonstrate that it does work. You might know the codebase better for this test in any case so I of course will allow edits by maintainers on this PR :)

@Faolain
Copy link
Contributor Author

Faolain commented Jan 16, 2026

cc: @marcus-pousette when you get a chance to review 🙏

@marcus-pousette
Copy link
Member

marcus-pousette commented Jan 16, 2026

Hey! Sorry for a late reply. I have been quite busy over past weeks. Super happy you found this. Seems to be a real bug, but I have been struggling a bit back and forth with libp2p and how it managed connections, and it has been quite a journey to get it working well. From what it looks like you change make connection handling better, but let's see also if this in the future would have other side-effects the testing suite does not cover.

I will merge this and do a release!

Copy link
Member

@marcus-pousette marcus-pousette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, and I ran the tests and it worked

@marcus-pousette marcus-pousette merged commit f0943dc into dao-xyz:master Jan 16, 2026
@marcus-pousette
Copy link
Member

bootstrap servers might still be affected by this bug, will update them as soon as possible. If you are running your own relay server you can try it out immedaitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants