Skip to content

Create a persistent dashboard to see all ray clusters and history #549

@jhasm

Description

@jhasm

The Ray dashboard service runs on the ray head node, which goes away after the ray cluster is deleted. This makes the ray dashboard of little use when you are running ephemeral or short lived ray clusters. Long running ray clusters have reliability issues. So the ray clusters get an impression of limited transparency and observability.

If we can move the dashboard service outside the ray cluster, we can have a global dashboard that can show the jobs across all the ray clusters, even after the cluster is deleted. This needs to be backed by a dedicated data store, which gets data from the ray cluster's dashboard service, while the cluster is still running. This will allow us to preserve the history of all the rich data that we see on the ray dashboards, independent of how long the ray cluster lives.

In the nutshell, we need a way to export the ray dashboard data to an external data store, while the cluster is running. And we need an external dashboard service that shows the exported data. The external dashboard can have the same GUI as the ray dashboard service, to keep it consistent.

Metadata

Metadata

Assignees

Labels

P4Resolve if not working on P0/P1/P2/P3 (long-term fix)experimentalThis issue is related to an experimental new feature, test, or integrationhelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions