-
-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Goal
Ship a pubsub fanout solution that can support very large audiences (target: 1 publisher → 1,000,000 subscribers) with bounded per-node upload, low latency, and measurable reliability — without requiring global membership knowledge.
Tracking PR: #582
Motivation
Current pubsub subscriber discovery can “explode” because the control-plane scales poorly (subscription gossip amplification). At large scale, the system must:
- avoid
to=[all subscribers] - avoid global ACKs
- avoid any per-message work that grows with total subscribers
Status (living tracker)
Implemented (WIP, not merged)
- Local sims to stress real
@peerbit/streamdata plane:packages/transport/pubsub/benchmark/pubsub-topic-sim.tspackages/transport/pubsub/benchmark/pubsub-tree-sim.ts
- Experimental production building block:
FanoutTree+FanoutChannel- Tree push + pull repair window
- Stable message ids for stream-level dedup (
FOUT + seq + channelKeyPrefix)
- Join/bootstrapping via bootstrap servers (tracker)
- relays announce capacity to bootstrap nodes (
TRACKER_ANNOUNCE) - joiners query bootstrap nodes for candidate parents (
TRACKER_QUERY/TRACKER_REPLY), then dial + normalJOIN_REQ Peerbit.bootstrap()also configurespeer.services.fanout.setBootstraps(...)- Fix: avoid advertising invalid
/p2p/<peerId>multiaddrs for simulated peers
- relays announce capacity to bootstrap nodes (
- End-to-end
FanoutTreesim/bench (fanout-tree-sim) with--timeoutMs+--assert*gating (FanoutTree sim/bench (end-to-end) with timeouts + CI thresholds #578) - CI-friendly sims + nightly scale runs
packages/transport/pubsub/test/fanout-tree-sim.spec.ts.github/workflows/nightly-sims.yml
- Real-protocol interactive sandbox + article (browser)
apps/peerbit-org/src/ui/FanoutProtocolSandbox.tsxdocs/blog/2026-02-02-interactive-fanout-visualizer.md
Next (proposed order)
- Overlay formation + maintenance as a first-class problem (metrics + scoring + re-parenting): FanoutTree overlay formation + maintenance (metrics + scoring + re-parenting) #585
- Enforce upload caps (token bucket) + overload policy + re-parent triggers: FanoutTree: enforce upload caps (token bucket) + overload policies + re-parent triggers #579
- Bounded repair improvements (neighbor-assisted / lazy summaries): FanoutTree: bounded repair improvements (neighbor-assisted / lazy summaries) #580
- Tracker hardening (limits, scoring, rate limiting) and decide if tracker should be a dedicated tiny service: FanoutTree bootstrap tracker hardening (limits, scoring, rate limiting) #581
- Relay monetization groundwork (policy semantics + metering hooks): Relay monetization groundwork (FanoutTree/DirectStream) #583
Requirements
Functional
- 1 publisher can publish at ~30 msg/s (configurable), and subscribers receive messages with bounded latency.
- Delivery works without global knowledge of subscribers.
- Nodes can configure upload limits and will not exceed them (best-effort within simulation and later production).
- Nodes can express relay preferences (e.g. “only relay if compensated / bid-based selection”).
Reliability
- Define explicit delivery goals per workload, e.g.:
- “live”: > 99% delivered within deadline under mild churn/loss
- “reliable”: > 99.9% delivered eventually with bounded overhead
- Repair must be bounded and local (neighbors/parent only).
Economics / incentives
- Simulate “relay earnings” based on forwarded bytes.
- Define a future-proof interface for bids/quotes (even if settlement is out-of-scope initially).
Engineering
- Provide a deterministic, local simulation harness to test 1k–10k nodes on one machine:
- measure delivery ratio, p50/p95/p99 latency, bandwidth overhead, queue/backpressure, and “earnings” distribution.
- Add CI-friendly “small sim” tests that assert invariant thresholds.
Proposed approach (high-level)
Adopt a Plumtree-inspired architecture:
- Tree push as the steady-state data plane (economical bandwidth).
- Local pull repair (and/or gossip summaries) as the reliability layer.
- Capacity-aware admission control: each relay accepts children within an upload budget and may prefer higher bids.
Acceptance criteria (simulation)
For a configurable workload (e.g. 2k nodes, 30 msg/s, 10s, 1KB):
- Connected subscribers ≥ 99% (or defined threshold).
- Delivered ≥ 99% (or defined threshold).
- Overhead factor ≤ X (defined per reliability mode).
- No node exceeds upload cap by more than Y% (or best-effort with explicit backpressure/drop policy).
Repo notes
A living spec is tracked in:
docs/scalable-fanout.mddocs/fanout-tree-protocol.md
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request