When running Mimir at scale, you inevitably wind up accumulating metrics that are completely unused. No dashboard references them and they're not used for alerting. These metrics however still consume resources both in terms of storage and ingest-capacity. In Grafana Cloud, you have Adaptive Metrics to surface this problem, but nothing similar exists for on-premises deployments. This tool identifies unused metrics in Mimir by cross-referencing what's actually stored with what's being used in dashboards and alerts. It exports the results as Prometheus metrics. This is intended as a self-hosted alternative to Adaptive Metrics for Enterprise and open-source Mimir deployments.
On each analysis cycle (once per day by default), the tool:
- Discovers tenants by querying the Mimir store-gateway for the list of active tenants.
- Analyzes dashboard usage by running
mimirtool analyze grafanaagainst your Grafana instance to determine which metrics are referenced in dashboards. - Optionally analyzes alert usage by fetching provisioned alert rules from the Grafana API and checking which metrics appear in their expressions. This assumes the tenant ID is part of the datasource name to work, and is thus toggleable.
- Fetches top metrics by cardinality for each tenant using Mimir's cardinality API (
/prometheus/api/v1/cardinality/label_values), retrieving the top 100 metric names. - Cross-references the top metrics against dashboard and alert usage. Each metric is classified as either active or inactive and exported as a Prometheus gauge.
The output is a standard Prometheus gauge (metric_active) that you can visualize. Here's an example of what that looks like in Grafana:
The exported metric looks like:
metric_active{metric="some_metric_name", tenant="some-tenant"} 1
metric_active{metric="another_metric_name", tenant="some-tenant"} 0
A value of 1 means the metric is referenced in at least one dashboard or alert rule.
The tool is configured through a YAML file:
grafana:
url: "https://grafana.example.com"
tokenFrom: "GRAFANA_TOKEN" # read the token from this environment variable
# token: "glsa_..." # or specify it directly (not recommended)
# insecure: false # skip TLS verification (default: false)
mimir:
querierUrl: "http://mimir-querier:8080"
storeGatewayUrl: "http://mimir-store-gateway:8080"
http:
host: "0.0.0.0"
port: 8080| Flag | Default | Description |
|---|---|---|
--config, -c |
(required) | Path to the YAML configuration file |
--output-dir, -o |
. |
Directory for intermediate files produced by mimirtool |
--interval, -i |
86400 |
Seconds between analysis cycles (default is 24 hours) |
--disable-alert-correlation |
false |
Skip alert rule analysis entirely |
For example, to run every 6 hours with alert correlation disabled:
cargo run -- --config config.yaml --interval 21600 --disable-alert-correlationA minimal installation looks like this:
helm install mimir-cardinality-analyzer oci://quay.io/duk4s/mimir-cardinality-analyzer-helm \
--set grafana.url="https://grafana.example.com" \
--set grafana.tokenFrom="GRAFANA_TOKEN" \
--set mimir.querierUrl="http://mimir-querier:8080" \
--set mimir.storeGatewayUrl="http://mimir-store-gateway:8080" \
--set extraEnv[0].name="GRAFANA_TOKEN" \
--set extraEnv[0].valueFrom.secretKeyRef.name="grafana-secret" \
--set extraEnv[0].valueFrom.secretKeyRef.key="token"See values.yaml for a full set of configurable values.
The included dashboards are modified versions of the CERN dashboards. They require two datasources in Grafana:
-
Infinity (
yesoreyeram-infinity-datasource) — pointed at your Mimir querier. The dashboards use this to query Mimir's cardinality API directly (e.g./prometheus/api/v1/cardinality/label_values). You'll need to install the Infinity plugin and create a datasource with the base URL set to your Mimir querier. -
Prometheus — pointed at wherever you're scraping the analyzer's
/metricsendpoint. This is used to query themetric_activegauge that the tool exports, which is what powers the "used vs. unused" view.
When importing the dashboards, Grafana will prompt you to map both datasource variables to the ones you've configured.
| Metric | Type | Labels | Description |
|---|---|---|---|
metric_active |
Gauge | metric, tenant |
1 if the metric is referenced in a dashboard or alert, 0 otherwise |
analysis_errors_total |
Counter | task (cycle, tenant), tenant (only when task=tenant) |
Count of analysis failures, per cycle or per tenant |
analysis_cycles_total |
Counter | status (success, failure) |
Count of completed analysis loop iterations |
tenants_discovered_total |
Gauge | — | Number of tenants found during the latest discovery |
last_successful_analysis_timestamp |
Gauge | — | Unix timestamp of the last successful analysis cycle |
| Metric | Type | Labels | Description |
|---|---|---|---|
external_request_duration_seconds |
Histogram | target (store-gateway, querier, grafana) |
Latency of outbound HTTP requests |
external_request_failures_total |
Counter | target |
Count of failed outbound HTTP requests |
mimirtool_executions_total |
Counter | command (analyze_grafana, analyze_prometheus), status (success, failure) |
Count of mimirtool subprocess invocations |
mimirtool_duration_seconds |
Histogram | command |
Duration of mimirtool subprocess executions |
| Metric | Type | Labels | Description |
|---|---|---|---|
http_requests_total |
Counter | endpoint |
Count of HTTP requests per endpoint |
http_request_duration_seconds |
Histogram | endpoint |
Latency of HTTP request handling per endpoint |
| Metric | Type | Labels | Description |
|---|---|---|---|
process_start_time_seconds |
Gauge | — | Process start time as Unix epoch seconds |
build_info |
Gauge | version |
Always 1; the version label carries the application version |
- Only the top 100 metrics by cardinality are analyzed per tenant. Metrics outside that window are not evaluated.
- Alert rule matching relies on datasource names containing the tenant identifier, which assumes a naming convention in your Grafana datasource setup. If this doesn't match your setup, use
--disable-alert-correlationto skip it.
