Skip to content

Trident Metrics enhancement #1088

@summertony15

Description

@summertony15

Describe the solution you'd like
I would like to include the backend_state as a label in the trident_backend_info metric.
Currently, the only metric that exposes backend_state is trident_backend_count, but this metric is aggregated by backend_type only.
There is no unique label that lets me correlate the backend state with a specific backend_name.
If trident_backend_info could expose a backend_state label (for example: online, failed), it would be possible to identify the exact backend that is in a non-healthy state directly from Prometheus and alerting rules.

Describe alternatives you've considered
• Using trident_backend_count to detect non-healthy backends, but this only shows counts per backend_type and does not map back to individual backend_name, so it is not actionable.
• Relying on tridentctl get backend or logs to check backend status, but this requires manual or out-of-band checks and does not integrate well with centralized monitoring/alerting systems such as Prometheus and Grafana.

Additional context
When a backend goes into a failed state, the metrics only show that there is at least one failed backend of a given backend_type, but there is no way to see which backend_name is affected from metrics alone.
Image

The trident_backend_info metric already includes labels such as backend_name, backend_type, and backend_uuid; adding a backend_state label to this metric would allow building precise alerts and dashboards that highlight the specific backend that is down.

Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions