Skip to content

Comments

feat!: fanout tree protocol + large-network sims + interactive sandbox#582

Open
marcus-pousette wants to merge 85 commits intomasterfrom
research/pubsub-large-network-testing
Open

feat!: fanout tree protocol + large-network sims + interactive sandbox#582
marcus-pousette wants to merge 85 commits intomasterfrom
research/pubsub-large-network-testing

Conversation

@marcus-pousette
Copy link
Member

@marcus-pousette marcus-pousette commented Feb 4, 2026

Why

Peerbit needs a pubsub mode that stays economic as the audience grows.

At large scale, subscription dissemination and redundant forwarding can dominate the payload. One common symptom we have seen is "subscribe gossip exploding".

Long-term goal: support workloads like 1 publisher -> 100k to 1M subscribers with:

  • bounded per-node upload
  • measurable latency and delivery
  • graceful behavior under churn and loss
  • no global membership lists in the steady-state data plane

What protocol are we building?

An experimental FanoutTree protocol on top of @peerbit/stream.

Today it targets a single-writer, sequence-numbered channel identified by (root, topic). The data plane aims for near-tree cost while reliability comes from bounded, local repair.

Control plane (join + discovery)

  • Joiners find candidate parents via a small set of bootstrap tracker nodes (rendezvous).
  • Relays announce capacity (TRACKER_ANNOUNCE) and joiners query candidates (TRACKER_QUERY/TRACKER_REPLY).
  • Join handshake is explicit (JOIN_REQ / JOIN_ACCEPT / JOIN_REJECT with optional redirects).
  • Admission is capacity-aware and can use an economic signal (bidPerByte) for selection and optional kicking when overloaded.

Data plane (bounded forwarding + bounded repair)

  • Delivery is tree push with bounded fanout.
  • Reliability is regained with local pull repair (parent cache window) and an optional neighbor-assisted repair mesh (IHAVE + FETCH_REQ).
  • Payload forwarding preserves the original signed bytes (no per-hop re-sign) and uses stable message ids for stream-level dedup.
  • Upload caps are enforced best-effort via a token bucket at message boundaries. Under overload we prefer controlled dropping and re-parenting over unbounded queues.

Wire format notes live in docs/fanout-tree-protocol.md and the broader engineering spec lives in docs/scalable-fanout.md.

Design inspiration (references)

What this PR adds

Large-network testing and sims (local + deterministic)

  • In-memory libp2p transport shim to run 1000s of peers without sockets, and without crypto verification dominating runtime.
  • Sims for:
    • topic-style pubsub stress (pubsub-topic-sim)
    • tree/fanout delivery (fanout-tree-sim)
    • churn, loss, timeouts, and CI-friendly assertions
  • Nightly workflow to run heavier sims and upload artifacts.

Browser sandbox

  • A docs/blog interactive sandbox that runs the real TopicControlPlane (/lazysub/0.0.1) and @peerbit/stream logic over the in-memory shim.
  • This is primarily for visualizing control-plane behavior and transport overhead at small N.

Peerbit integration

  • Exposes peer.services.fanout (FanoutTree) alongside peer.services.pubsub (TopicControlPlane), sharing the same TopicRootControlPlane.
  • SharedLog target: "all" uses the fanout data plane when configured; it intentionally does not fall back to legacy RPC send.

Transport scalability hardening

  • @peerbit/stream route caching is now explicitly bounded (LRU + truncation) to avoid OOM in large local sims; bounds are configurable via DirectStreamOptions (routeCacheMaxFromEntries, routeCacheMaxTargetsPerFrom, routeCacheMaxRelaysPerTarget).

Tracking issues

How to review

Suggested order:

  1. docs/scalable-fanout.md
  2. docs/fanout-tree-protocol.md
  3. packages/transport/pubsub/src/fanout-tree.ts (+ fanout-channel.ts)
  4. Sims: packages/transport/pubsub/benchmark/*
  5. Demo UI: apps/peerbit-org/src/ui/FanoutProtocolSandbox.tsx

How to verify locally

  • pnpm run build
  • pnpm run test:ci:part-1
  • pnpm run test:ci:part-2
  • pnpm run test:ci:part-3
  • pnpm run test:ci:part-4
  • pnpm run test:ci:part-5

Notes / non-goals for now

  • Protocol multicodec is /peerbit/fanout-tree/0.5.0 and assumes coordinated upgrades.
  • Settlement / on-chain economics are intentionally out of scope. This PR only adds hooks (bids + accounting signals) and a simulation harness.

@marcus-pousette marcus-pousette changed the title pubsub: large-network sims + interactive protocol sandbox pubsub: FanoutTree protocol + large-network sims + interactive sandbox Feb 4, 2026
@marcus-pousette
Copy link
Member Author

Added overlay-formation objective metrics to fanout-tree-sim (issue #585):

  • attachMs distribution (time-to-attach since join start)
  • formationTree* (depth/children/orphans snapshot right after join)
  • formationPaths* (underlay shortest-path distances + overlay/underlay stretch)
  • formationScore (+ optional CLI assert --assertMaxFormationScore)

These are now asserted in CI via fanout-tree-sim.spec.ts to catch formation regressions early.

@marcus-pousette
Copy link
Member Author

Pushed a few follow-up commits to tighten the PR before merge:

  • site/docs: clarify ‘Real subscribe’ as control-plane gossip + explain pre-publish traffic
  • peerbit-server: allow API bootstrap to accept an explicit address list (avoids port conflicts; improves tests)
  • test: reduce flakiness (stream dial timeout, e2e lifecycle timeout, bootstrap test skip outside node)

@marcus-pousette
Copy link
Member Author

Follow-ups:

  • Fix CI lint failure (wrap BOOTSTRAP_PATH POST case block to satisfy no-case-declarations).
  • Docs: align the blog’s embedded sandbox defaults + clarify that subscribe gossip is visible when Flow capture includes setup.

@marcus-pousette marcus-pousette changed the title pubsub: FanoutTree protocol + large-network sims + interactive sandbox feat!: fanout tree protocol + large-network sims + interactive sandbox Feb 5, 2026
Remove legacy rpc fallback for target:"all" append path and make fanout failures explicit.\nForward fanout options through Documents->SharedLog and update tests/spec notes for root-only fanout publish/control-plane behavior.
Bound route-token cache with TTL+eviction, add route announce/query/reply control messages, expose fanout helpers on Peerbit instead of DirectSub wiring, and add route/cache metrics in sims + protocol docs.
Increase timing tolerance for known flaky scenarios and bound late-drop wait to avoid callback timing hangs in CI.
Scale scenario size and widen convergence window for min-replica search test to avoid CI variance while preserving intent.
@marcus-pousette
Copy link
Member Author

CI is green on the latest head (44dfae5ce) and this PR now lands the routed FanoutTree data path + cache hardening.

What is now covered in this PR:

  • routed unicast control-plane in FanoutTree (ROUTE_ANNOUNCE, ROUTE_QUERY, ROUTE_REPLY, proxying)
  • bounded route cache policy (max entries + TTL + eviction/expiry metrics)
  • shared-log target-all path through fanout channel
  • fanout channel helpers on Peerbit and fanout bootstrap alignment
  • de-flake fixes for document tests that were failing in CI

Follow-up issues for remaining long-term work:

\nBREAKING CHANGE: FanoutTree no longer uses periodic route announcements; root route resolution is now demand-driven via subtree route queries.
Deflake integration tests and refresh docs/examples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant