Skip to content

Conversation

@sk09-dev
Copy link

This change adds support in do-agent to properly collect and aggregate Dedicated Inference (DI) metrics for Insights.

this PR:

  • Adds DI metrics to the whitelist so they are accepted by do-agent
  • Defines aggregation rules for DI metrics to control label cardinality
  • Preserves Prometheus histogram semantics by keeping le labels for _bucket metrics
  • Drops high-cardinality labels like job, pod, and OTEL scope labels where appropriate
  • Ensures _bucket, _sum, and _count metrics remain consistent and query-safe
  • The goal is to make DI metrics safe, efficient, and compatible with Insights ingestion.

Testing performed

Ran do-agent with a local DI Prometheus-style metrics endpoint:

go run ./cmd/do-agent
--di-metrics-address http://127.0.0.1:9109/metrics
--stdout-only

  • Verified that DI metrics are:
  • Successfully scraped
  • Present in agent output
  • Correctly aggregated
  • Confirmed that:
  • Histogram metrics (*_bucket, *_sum, *_count) are emitted correctly
  • le label is preserved for bucket metrics
  • High-cardinality labels (job, pod) are removed from aggregated DI metrics
  • Manually inspected output to ensure no unintended label or metric drops
  • No integration or production environment changes were required for this PR.

@sk09-dev sk09-dev requested a review from a team as a code owner January 13, 2026 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant