-
-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
This has been something happening on Albaik recently, the node that prometheus is running on becomes unavailable, which causes issues for monitoring and application level autoscaling.
The current fix is to manually intervene and replace the node, as documented [here].(https://github.com/thoughtbot/mission-control-platform/blob/main/aws/src/debug/cluster-errors.md#unreachable-nodes)
This issue is to track debugging the process and figuring out what happens, and resolve it automatically.
Metadata
Metadata
Assignees
Labels
No labels