Skip to content

Feature: define metrics groups in a config file #149

@ringerc

Description

@ringerc

I have a need to send a more complex set of metrics from Avalanche for my test needs than it presently supports - I need to send a core set of stable series, plus a smaller fraction of high churn series, some intermittent series, etc.

This is not presently feasible without running a bunch of distinct Avalanche instances with different configurations. And that does not behave the same as a single Avalanche producing the same metrics in a number of ways, since each instance has a distinct (job, instance) pair, a distinct label rewrite/drop cache in Prometheus, different scrape times that can be parallelized, etc. Their scrape body sizes are not added up. And more. I can't get useful aggregate statistics about it and compare it to my real workloads this way, to more confidently say whether the Avalanche workloads reasonable resemble the real workloads.

If there's no strong disagreement about this I might cook up a patch.

Proposed implementation

I'm thinking of writing up a refactor/patch to Avalanche to give it an optional metrics config file. This config file would contain a list of metrics groups to emit, where each metric group has a metric type (gauge, counter, etc) and will support the various configuration flags and modes Avalanche has for that metric type.

For example, I would like to be able to write something like:

metrics:
  - group: steady_gauges
    type: gauge
    name_prefix: steady_g
    # for more complex naming needs, a go template can be expanded for the name instead;
    # the default template if not specified would be:
    #name_template: {{.namePrefix}}_{{.metricLengthPadding}}_{{.metricCycle}}_{{.metricId}}
    metric_count: 15
    series_count: 1
    value_interval: 15
    # 0 is implied if omitted
    series_interval: 0
    # 0 is implied if omitted
    metric_interval: 0
    description_suffix: >
      Steady state core metrics with no churn and reasonable series cardinality.
  - group: churning_gauges
    type: gauge
    name_prefix: churn_g
    metric_count: 10
    series_count: 5
    value_interval: 5
    series_interval: 30
    description_suffix: >
      Metrics with rapid series label churn, simulating poorly defined metrics that have rapidly changing labels used to carry
      data-values.
  - group: high_cardinality_gauges
    series_change_mode: spike
    type: gauge
    name_prefix: highc_g
    metric_count: 3
    series_count: 50
    value_interval: 30
    spike_multiplier: 2.5
    description_suffix: >
      A few low-churn metrics with very high series cardinality per metric, and spiky output.
# [... and so on ... ]

Each stanza would create a metrics Collector that's added to the Registry, all running independently.

Sure, it's verbose, but the expectation is that it'd generally be generated by tooling that characterises and classifies a workload's metrics. And the basic CLI method would remain the primary Avalanche interface for common cases.

After the change, the CLI configuration would be expanded into a list of these configurations internally, then executed. E.g. this:

avalanche \
  --gauge-metric-count=30 \
  --counter-metric-count=20 \
  --histogram-metric-count=0 \
  --histogram-metric-bucket-count=10 \
  --native-histogram-metric-count=0 \
  --summary-metric-count=0 \
  --summary-metric-objective-count=0 \
  --series-count=2 \
  --value-interval=5 \
  --series-interval=10 \
  --metric-interval=0 \
  --port=9001

would at runtime be translated internally into the go-struct equivalent of this yaml config representation:

metrics:
  - group: avalanche_gauges
    type: gauge
    name_template: avalanche_{{.metricType}}_metric_{{.metricLengthPadding}}_{{.metricCycle}}_{{.metricId}}
    metric_count: 30
    series_count: 2
    value_interval: 5
    series_interval: 10
    metric_interval: 0
   - group: avalanche_counters
    type: counter
    # this will be the default template if omitted
    #name_template: avalanche_{{.metricType}}_metric_{{.metricLengthPadding}}_{{.metricCycle}}_{{.metricId}}
    metric_count: 20
    series_count: 2
    value_interval: 5
    series_interval: 10
    metric_interval: 0

Alternatives considered

It might be possible to run a proxy in front of a number of Avalanche instances to combine their responses if Avalanche is patched to support a configurable metric name prefix. But it'll be clumsy at best, difficult to configure and hard to maintain.

I've also looked at making a wrapper binary and combining the results using a prometheus.Gatherer, but too much of the logic is currently tied up in cmd/avalanche/avalanche.go where it's not exposed or reusable, e.g. the creation of the prometheus.Registry. It'd need rearranging to expose the machinery separately to the configuration logic anyway.

Related

I suspect this might benefit from #73 too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions