Skip to content

[Telemetry] Standardize and add metrics #606

@okdas

Description

@okdas

Objective

It appears most metrics we've worked in the past are not being populated on Prometheus exporter. That means either we no longer need to expose that information for monitoring and troubleshooting purposes, or there is a mistake that prevents that data from being collected.

Either way, now that we have working DevNets it is a good time to revisit metrics to see what we can monitor and improve.

Goals

  • Provide meaningful metrics so node runners, including in-house operations, can get insights of software operation
  • Standardize how we add metrics for telemtry

Deliverable

  • Observe DevNet and identify 5-10 metrics that need to be tracked via telemetry
  • Create downstream tickets to document & implement the metrics above in 1 or more separate issues

TBD, but we likely need metrics to provide information about p2p usage (number of peers, messages, errors), persistence and rpc metrics (num of requests, typical http server metrics, etc).

Non-goals / Non-deliverables

  • This has nothing to do with analytics or traces

General issue deliverables

  • Update the appropriate CHANGELOG(s)
  • Update any relevant local/global README(s)
  • Update relevant source code tree explanations
  • Add or update any relevant or supporting mermaid diagrams

Testing Methodology

  • Task specific tests or benchmarks: make ...
  • New tests or benchmarks: make ...
  • All tests: make test_all
  • LocalNet: verify a LocalNet is still functioning correctly by following the instructions at docs/development/README.md

Creator: @okdas

Metadata

Metadata

Assignees

Labels

infraCore infrastructure - not protocol relatedtelemetryeverything related to collection telemetrytriageIt requires some decision-making at team level (it can't be worked on as it stands)

Type

No type

Projects

Status

In Progress

Relationships

None yet

Development

No branches or pull requests

Issue actions