Feature: define metrics groups in a config file

I have a need to send a more complex set of metrics from Avalanche for my test needs than it presently supports - I need to send a core set of stable series, plus a smaller fraction of high churn series, some intermittent series, etc.

This is not presently feasible without running a bunch of distinct Avalanche instances with different configurations. And that does not behave the same as a single Avalanche producing the same metrics in a number of ways, since each instance has a distinct `(job, instance)` pair, a distinct label rewrite/drop cache in Prometheus, different scrape times that can be parallelized, etc. Their scrape body sizes are not added up. And more. I can't get useful aggregate statistics about it and compare it to my real workloads this way, to more confidently say whether the Avalanche workloads reasonable resemble the real workloads.

If there's no strong disagreement about this I might cook up a patch.

## Proposed implementation

I'm thinking of writing up a refactor/patch to Avalanche to give it an optional metrics config file. This config file would contain a list of metrics groups to emit, where each metric group has a metric type (`gauge`, `counter`, etc) and will support the various configuration flags and modes Avalanche has for that metric type.

For example, I would like to be able to write something like:

```
metrics:
  - group: steady_gauges
    type: gauge
    name_prefix: steady_g
    # for more complex naming needs, a go template can be expanded for the name instead;
    # the default template if not specified would be:
    #name_template: {{.namePrefix}}_{{.metricLengthPadding}}_{{.metricCycle}}_{{.metricId}}
    metric_count: 15
    series_count: 1
    value_interval: 15
    # 0 is implied if omitted
    series_interval: 0
    # 0 is implied if omitted
    metric_interval: 0
    description_suffix: >
      Steady state core metrics with no churn and reasonable series cardinality.
  - group: churning_gauges
    type: gauge
    name_prefix: churn_g
    metric_count: 10
    series_count: 5
    value_interval: 5
    series_interval: 30
    description_suffix: >
      Metrics with rapid series label churn, simulating poorly defined metrics that have rapidly changing labels used to carry
      data-values.
  - group: high_cardinality_gauges
    series_change_mode: spike
    type: gauge
    name_prefix: highc_g
    metric_count: 3
    series_count: 50
    value_interval: 30
    spike_multiplier: 2.5
    description_suffix: >
      A few low-churn metrics with very high series cardinality per metric, and spiky output.
# [... and so on ... ]
```

Each stanza would create a metrics [Collector](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#Collector) that's added to the Registry, all running independently.

Sure, it's verbose, but the expectation is that it'd generally be generated by tooling that characterises and classifies a workload's metrics. And the basic CLI method would remain the primary Avalanche interface for common cases.

After the change, the CLI configuration would be expanded into a list of these configurations internally, then executed. E.g. this:

```
avalanche \
  --gauge-metric-count=30 \
  --counter-metric-count=20 \
  --histogram-metric-count=0 \
  --histogram-metric-bucket-count=10 \
  --native-histogram-metric-count=0 \
  --summary-metric-count=0 \
  --summary-metric-objective-count=0 \
  --series-count=2 \
  --value-interval=5 \
  --series-interval=10 \
  --metric-interval=0 \
  --port=9001
```

would at runtime be translated internally into the go-struct equivalent of this yaml config representation:

```
metrics:
  - group: avalanche_gauges
    type: gauge
    name_template: avalanche_{{.metricType}}_metric_{{.metricLengthPadding}}_{{.metricCycle}}_{{.metricId}}
    metric_count: 30
    series_count: 2
    value_interval: 5
    series_interval: 10
    metric_interval: 0
   - group: avalanche_counters
    type: counter
    # this will be the default template if omitted
    #name_template: avalanche_{{.metricType}}_metric_{{.metricLengthPadding}}_{{.metricCycle}}_{{.metricId}}
    metric_count: 20
    series_count: 2
    value_interval: 5
    series_interval: 10
    metric_interval: 0
```

## Alternatives considered

It might be possible to run a proxy in front of a number of Avalanche instances to combine their responses if Avalanche is patched to support a configurable metric name prefix. But it'll be clumsy at best, difficult to configure and hard to maintain.

I've also looked at making a wrapper binary and combining the results using a `prometheus.Gatherer`, but too much of the logic is currently tied up in `cmd/avalanche/avalanche.go` where it's not exposed or reusable, e.g. the creation of the `prometheus.Registry`. It'd need rearranging to expose the machinery separately to the configuration logic anyway.

## Related

I suspect this might benefit from https://github.com/prometheus-community/avalanche/issues/73 too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: define metrics groups in a config file #149

Proposed implementation

Alternatives considered

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: define metrics groups in a config file #149

Description

Proposed implementation

Alternatives considered

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions