Node running prometheus becomes unavailable

This has been something happening on Albaik recently, the node that prometheus is running on becomes unavailable, which causes issues for monitoring and application level autoscaling.
The current fix is to manually intervene and replace the node, as documented [here].(https://github.com/thoughtbot/mission-control-platform/blob/main/aws/src/debug/cluster-errors.md#unreachable-nodes)

This issue is to track debugging the process and figuring out what happens, and resolve it automatically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Node running prometheus becomes unavailable #64

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Node running prometheus becomes unavailable #64

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions