Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers

Hello. We have an execution model which is typical of a producer-consumer pattern with a queue/topic in the middle. Currently, the queue holds work of same type from multiple customers/tenants. The consumers/workers are non-http based applications that pick/pop a message from the queue and execute work. These consumers are Kubernetes pods that are spun up using a `Deployment`. They are configured to autoscale based on the work available on the queue. We would like to know at least two metrics about the performance of the workers/backlog burn up
- number of executions processed for each customer/tenant
- number of success or failures for each customer/tenant

What is the best way to publish metrics from these ephemeral workers? We were trying to send through Prometheus gateway but looks it has design philosophy which for us
a) can either result in metric overwrites from multiple workers/pods if they try to use same job/group name
b) can result in garbage build up if job/group name is based on instance/pod name as pods come and go over longer periods of time

We could additionally introduce a mini HTTP web server for each consumer and expose a scrape metric endpoint. It is possibly a bit overkill but it would work. Please suggest.

<img width="614" alt="image" src="https://user-images.githubusercontent.com/1955747/180807058-83631554-96d7-4ca7-a1f1-3c549ecebc1e.png">


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #60

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #60

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions