Skip to content

Fix/prometheus metrics registration#186

Merged
tmgrask merged 9 commits intoPsiphon-Inc:mainfrom
amirhnajafiz:fix/prometheus-metrics-registration
Feb 11, 2026
Merged

Fix/prometheus metrics registration#186
tmgrask merged 9 commits intoPsiphon-Inc:mainfrom
amirhnajafiz:fix/prometheus-metrics-registration

Conversation

@amirhnajafiz
Copy link
Contributor

Hi everyone,

I identified several issues related to the Prometheus server configuration and metric registration logic. The following changes have been implemented to improve stability, correctness, and operational safety.

ChangeLog

  • Replaced Prometheus MustRegister with Register to ensure safer metric registration and allow reuse of already-registered metrics without triggering panics.
  • Configured explicit read, write, and idle timeouts for the Prometheus HTTP server to improve robustness under load and prevent resource exhaustion.
  • Corrected error handling in the Prometheus-related code paths to ensure failures are properly detected and surfaced.

Rationale

  • Prevent potential runtime panics and unexpected service failures caused by unsafe metric registration.
  • Improve the reliability and resilience of the Prometheus server under real-world conditions.
  • Ensure the monitoring stack remains efficient and does not negatively impact user systems or overall application performance.

Copy link
Collaborator

@tmgrask tmgrask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for preparing this — I think this is a good thing to address, but it needs a couple of changes before merge.

  1. Registry mismatch (breaks /metrics)
  • New() creates a custom registry (m.registry) and the endpoint serves that registry via promhttp.HandlerFor(m.registry, ...).
  • The new helpers in registration.go call prometheus.Register(...), which registers on the global default registry, not m.registry.
  • Result: the served endpoint is empty (I reproduced this locally).
    Please keep the existing custom-registry design and register all metrics/collectors on m.registry (not the global default registry).
  1. Inverted server error condition
  • In StartServer(), the goroutine currently logs only when errors.Is(err, http.ErrServerClosed) is true.
  • That is the expected shutdown path; real errors are currently suppressed.
    Please change to:
    if err := m.server.Serve(listener); err != nil && !errors.Is(err, http.ErrServerClosed) { ... }
  1. Please add a regression test for registry wiring
  • Add a test that verifies metrics created in New() are present in the same registry being served (m.registry), so /metrics is not empty.
  • This will prevent future regressions between custom-registry serving and registration path.

@amirhnajafiz
Copy link
Contributor Author

@tmgrask Got them all. Sorry for the mistake — I originally saw that all metrics were being registered in the Prometheus global registry (except for the standard ones), which is why I made that mistake.

Anyway, I could also refactor the New function in metrics.go to avoid passing the registry as a parameter to each of the helper functions, but I noticed you have a strict policy on refactors :D

So let me know if anything else needs to be changed or improved.

Cheers.

Copy link
Collaborator

@tmgrask tmgrask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick turnaround and for addressing the feedback.

The latest changes look good to me. Passing the registry around explicitly is a bit more verbose, but it’s clear and low-risk.

I noticed you have a strict policy on refactors :D

Thanks for reading the guidelines. I mostly wrote about that just to try to minimize the number of too-big-to-review PRs, I wouldn't have been upset if you refactored the way we're accessing the registry :)

This is good as is IMO!

@tmgrask tmgrask merged commit 2cb2057 into Psiphon-Inc:main Feb 11, 2026
4 checks passed
@amirhnajafiz amirhnajafiz deleted the fix/prometheus-metrics-registration branch February 11, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants