Prometheus: New component to add metric-processing to Alloy

### Component(s)

prometheus.relabel

### Request

This request is about aggregating/combining available metrics into new high-level metrics following a set of rules. I have the following concrete usecase in mind:

### Use case

I'd like to combine a number of metrics from different sources into a high-level metric to show the "healthiness" of a host. I have the following metrics for example:

* Gauge: `disk_space_used(host=hostX)` Range 0.0 (empty) to 1.0 (fully used disk)
* Gauge: `service_online(host=hostX, service=serviceX)` (0 when offline, 1 when online)
* Counter: `failed_login_attempts(host=hostX)` (counting number of failed attempts)

These should be combined to a new high-level metric:

* Gauge: `host_healthy(host=hostX)` Range 0.0 (unhealthy) to 1.0 (healthy)

There are conditions that drive the resulting value:

* Host is healthy (1.0) when `disk_space_used` is below 0.9
  * Host is unhealthy (0.0) when `disk_space_used` is over 0.99
  * Host is "not-so-healthy" (0.7) when `disk_space_used` is over 0.9, but under 0.99
* Host is healthy when all services of a host are online
* Host is healthy when failed login attempts over the time_range are below 3

### Suggestion

I think this functionality could be added to Alloy in a similar fashion as the `loki.process` stages work:

* `prometheus.scrape` collects metrics from a number of scrape-targets and forwards the metrics to a new `prometheus.process` component
* `prometheus.process` goes through several stages:
  * `add_gauge` creates a new metric called `host_healthiness` with an initial value of 1.0 and a label matching the input metrics `hostname` value. (Optional: If the metric already exists it changes the value accordingly)
  * `set_value` has a condition `disk_space_left >= 0.9`. If this condition applies the value of `host_healthiness` is multiplied by 0.7 decreasing the "overall health score"
  * `set_value` can also set absolute values. For example when `failed_login_attempts > 3` it sets the `host_healthiness` value to 0.0 (host unhealthy)

In the end the collected metrics are either dropped or send forward to Prometheus along with the new `host_healthiness` metric. In Grafana all you have to do is to add a widget that shows what `host_healthiness` is set to. Perhaps in a Timeline Widget that allows to quickly see what hosts are available and in what state they have been over the past time.

I'm sure this is just the tip of the iceberg for a feature that could extend the usefulness of using Alloy to scrape Prometheus targets. I'm also aware that this could get complex quite quickly and needs some proper analysis to get the design and usage just right. I'd consider this suggestion as an initial starting point and would like to update the issue as discussion goes on.

Please let me know whether there are any alternatives I haven't considered yet. This request has been created as a follow-up to my [Stackoverflow Question](https://stackoverflow.com/questions/79916959/generate-highlevel-metric-from-multiple-lowlevel-metrics) about implementing such a thing.

### Tip

<sub>React with 👍 if this issue is important to you.</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus: New component to add metric-processing to Alloy #5960

Component(s)

Request

Use case

Suggestion

Tip

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Prometheus: New component to add metric-processing to Alloy #5960

Description

Component(s)

Request

Use case

Suggestion

Tip

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions