Skip to content

update prometheus metric labels to constants#4797

Open
tylerauerbeck wants to merge 2 commits intoarmadaproject:masterfrom
tylerauerbeck:const-prom-labels
Open

update prometheus metric labels to constants#4797
tylerauerbeck wants to merge 2 commits intoarmadaproject:masterfrom
tylerauerbeck:const-prom-labels

Conversation

@tylerauerbeck
Copy link
Copy Markdown

@tylerauerbeck tylerauerbeck commented Mar 25, 2026

What type of PR is this?

What this PR does / why we need it

Updates prometheus labels to be constants instead of strings. This improves code quality and makes it less likely for future develops to mistakenly create new metrics in the future by introducing typos.

Which issue(s) this PR fixes

Fixes #4777

Special notes for your reviewer

Signed-off-by: Tyler Auerbeck <tylerauerbeck@users.noreply.github.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 25, 2026

Greptile Summary

This PR refactors internal/common/metrics/scheduler_metrics.go by introducing 16 package-level constants for all Prometheus label name strings (e.g., labelPool, labelCluster, labelQueue) and replacing every raw string literal in the descriptor definitions with the corresponding constant. This improves consistency and reduces the risk of future typo-driven label mismatches.

Key observations:

  • All previously flagged incomplete conversions (QueueAllocatedDesc, QueueUsedDesc, QueueLeasedPodCountDesc still using the raw "cluster" string) have been addressed in the current HEAD and all 16 constants are applied consistently across every descriptor.
  • The new constants are unexported (lowercase). While this is sufficient for current single-file usage, it means other files in the metrics package cannot reference them without repeating raw string literals — potentially reintroducing the problem the PR aims to prevent.
  • A pre-existing issue was noticed (not introduced by this PR): AllDescs contains duplicate entries for MedianQueueDurationDesc and QueuePriorityDesc, and is missing entries for QueueDistinctSchedulingKeysDesc, NodeJobPhaseCounterDesc, and ClusterCordonedStatusDesc. Since this file is being actively modified, it would be a good opportunity to fix these as well.

Confidence Score: 4/5

  • Safe to merge — purely a refactor with no runtime behavior changes; all constants map 1:1 to their original string values.
  • The PR correctly replaces every string literal with its corresponding constant and addresses all previously flagged issues. No logic changes are made. A minor style concern exists around constant visibility (unexported), and a pre-existing AllDescs hygiene issue is noted but not introduced by this PR.
  • No files require special attention for correctness; AllDescs in scheduler_metrics.go has pre-existing duplicate/missing entries worth cleaning up.

Important Files Changed

Filename Overview
internal/common/metrics/scheduler_metrics.go Refactors all Prometheus descriptor label strings into package-level constants. All previously flagged occurrences (QueueAllocatedDesc, QueueUsedDesc, QueueLeasedPodCountDesc) are now correctly using labelCluster. Constants are unexported (lowercase), limiting reuse to within the metrics package. No logic changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Label Constants Block<br/>labelPool, labelQueue, labelCluster, ...] -->|used in| B[Prometheus Desc Variables<br/>QueueAllocatedDesc, ClusterCapacityDesc, ...]
    B -->|registered via| C[AllDescs slice]
    C -->|iterated in| D[Describe func]
    B -->|referenced by| E[New* metric constructor funcs<br/>e.g. NewQueueAllocated, NewClusterCapacity]
    E -->|emit| F[Prometheus Metrics]
Loading

Comments Outside Diff (1)

  1. internal/common/metrics/scheduler_metrics.go, line 331-362 (link)

    P2 AllDescs has duplicate and missing descriptor entries

    AllDescs is used by the Describe() function to register all metric descriptors. The current slice has two issues that could cause silent metric registration problems (though these are pre-existing and not introduced by this PR):

    1. MedianQueueDurationDesc appears twice (lines 343–344), which will cause Prometheus to register it twice during Describe().
    2. QueuePriorityDesc also appears twice (lines 334 and 358).
    3. Several descriptors are entirely absent: QueueDistinctSchedulingKeysDesc and NodeJobPhaseCounterDesc (and their new cluster/cordoned siblings like ClusterCordonedStatusDesc) are not included, so they won't be advertised during Describe().

    Since this PR is touching this file, it would be a good opportunity to clean these up.

Reviews (3): Last reviewed commit: "update missed cluster labels" | Re-trigger Greptile

Signed-off-by: Tyler Auerbeck <tylerauerbeck@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract Prometheus metric label names into constants

1 participant