Skip to content

Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #60

@raghavan20

Description

@raghavan20

Hello. We have an execution model which is typical of a producer-consumer pattern with a queue/topic in the middle. Currently, the queue holds work of same type from multiple customers/tenants. The consumers/workers are non-http based applications that pick/pop a message from the queue and execute work. These consumers are Kubernetes pods that are spun up using a Deployment. They are configured to autoscale based on the work available on the queue. We would like to know at least two metrics about the performance of the workers/backlog burn up

  • number of executions processed for each customer/tenant
  • number of success or failures for each customer/tenant

What is the best way to publish metrics from these ephemeral workers? We were trying to send through Prometheus gateway but looks it has design philosophy which for us
a) can either result in metric overwrites from multiple workers/pods if they try to use same job/group name
b) can result in garbage build up if job/group name is based on instance/pod name as pods come and go over longer periods of time

We could additionally introduce a mini HTTP web server for each consumer and expose a scrape metric endpoint. It is possibly a bit overkill but it would work. Please suggest.

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions