Skip to content

fix(metrics): avoid consuming protocol frames before app readers#3394

Merged
dozyio merged 12 commits intolibp2p:mainfrom
lodekeeper:fix/prometheus-metrics-identify-race
Mar 4, 2026
Merged

fix(metrics): avoid consuming protocol frames before app readers#3394
dozyio merged 12 commits intolibp2p:mainfrom
lodekeeper:fix/prometheus-metrics-identify-race

Conversation

@lodekeeper
Copy link
Contributor

Summary

Fixes a protocol-stream observability race in @libp2p/prometheus-metrics where tracking could consume inbound frames before app/protocol readers attach.

Fixes #3393

Root cause

_track() currently calls stream.addEventListener('message', ...).

For AbstractMessageStream, any message listener can trigger dispatch of buffered inbound bytes. If metrics attaches first, protocol handlers that attach later may miss the first frame.

Changes

1) Avoid adding message listener in _track

Instead of adding an event listener, wrap stream.dispatchEvent:

  • when evt.type === 'message', increment received bytes metric
  • then forward to original dispatchEvent

This preserves received-byte accounting but does not increase listenerCount('message'), so metrics no longer affects stream buffering/dispatch semantics.

2) Regression test

Added test in packages/metrics-prometheus/test/streams.spec.ts:

  • tracks a stream before app listener is attached
  • sends data first
  • attaches app iterator later
  • asserts first frame is still readable
  • asserts metrics still count received bytes

Validation done downstream

In Lodestar local beacon runs, this change eliminated observed short-window Unknown peer spikes linked to identify misses (while keeping metrics enabled).

@nflaig
Copy link

nflaig commented Mar 3, 2026

I've deployed this branch to one of our node groups and it does resolve our issue and we no longer are seeing that many unknown peers

image

@lodekeeper
Copy link
Contributor Author

Thanks for validating this in production 🙌 Great signal that the fix addresses the unknown-peer spike in real deployments. I’ll keep this branch rebased and ready while we wait for maintainer review.

@wemeetagain
Copy link
Member

@lodekeeper please fix up the branch so CI can pass

@lodekeeper
Copy link
Contributor Author

lodekeeper commented Mar 3, 2026

Thanks for the ping — fixed and pushed now: bfb080b (lint fixes in streams.spec.ts for the new early-data test). CI has re-triggered on this commit; I’ll keep an eye on it and follow up if anything else is needed.

Copy link
Collaborator

@tabcat tabcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lodekeeper hi thanks for opening this. would be great if you can make similar changes to packages/metrics-opentelemetry and packages/metrics-simple. but only if you have the time!

@tabcat tabcat requested a review from dozyio March 3, 2026 17:03
@lodekeeper
Copy link
Contributor Author

lodekeeper commented Mar 3, 2026

@tabcat thanks — addressed your review note. I applied the same dispatchEvent-based receive tracking change to:

  • packages/metrics-opentelemetry/src/index.ts
  • packages/metrics-simple/src/index.ts

Pushed in 4dbf55d (branch head now includes this + the earlier type-assertion cleanup).

@dozyio
Copy link
Collaborator

dozyio commented Mar 4, 2026

@lodekeeper Could you update the tests for metrics-simple and metrics-opentelemetry so that they keep parity with prometheus please?

@tabcat
Copy link
Collaborator

tabcat commented Mar 4, 2026

@dozyio there arent similar test files for those modules so fine merging without those.

@lodekeeper
Copy link
Contributor Author

lodekeeper commented Mar 4, 2026

@dozyio done 👍 I added parity tests for both backends so they mirror the prometheus behavior updates:

  • packages/metrics-simple/test/index.spec.ts
  • packages/metrics-opentelemetry/test/index.spec.ts

Pushed in commit 7c6a553.

@lodekeeper
Copy link
Contributor Author

Thanks @tabcat — acknowledged. I already added parity tests for both backends in 7c6a553, so we can merge with those in place (or I can drop them if you prefer minimal scope).

@dozyio
Copy link
Collaborator

dozyio commented Mar 4, 2026

@dozyio done 👍 I added parity tests for both backends so they mirror the prometheus behavior updates:

  • packages/metrics-simple/test/index.spec.ts
  • packages/metrics-opentelemetry/test/index.spec.ts

Pushed in commit 7c6a553.

You will need to add @libp2p/utils to the devDependencies for metrics-simple and metrics-opentelemetry

@lodekeeper
Copy link
Contributor Author

lodekeeper commented Mar 4, 2026

Good catch — done. I added @libp2p/utils to devDependencies for both packages:

  • packages/metrics-simple/package.json
  • packages/metrics-opentelemetry/package.json

Pushed in commit cbaadfe.

@tabcat
Copy link
Collaborator

tabcat commented Mar 4, 2026

@lodekeeper just remove that last commit cbaadfe. no need to add tests for the other metrics packages, just keep the fixes there and looks good.

@lodekeeper
Copy link
Contributor Author

Done — I dropped the parity-test additions and corresponding devDependency change.\n\nI reverted commits 7c6a553 and cbaadfe in bcb1710, keeping only the core fixes on the branch.

@dozyio dozyio merged commit 4db427d into libp2p:main Mar 4, 2026
34 checks passed
@tabcat tabcat mentioned this pull request Mar 4, 2026
@dozyio
Copy link
Collaborator

dozyio commented Mar 4, 2026

Thanks @lodekeeper for the PR - merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

metrics-prometheus trackProtocolStream can consume protocol frames before app readers attach

5 participants