Skip to content

[BUG] Spark jobs should only select UP nodes in current subcluster at the time the job starts #551

@bryanherger

Description

@bryanherger

This is an issue emerging with v12 Eon mode and sandbox support. Current behavior appears to be to select all nodes; this doesn't work with sandboxing, where some nodes can't write to the main communal storage, and also may break during dynamic scaling, especially when a subcluster shrinks, which means some workers will point to a down node. The correct behavior is to check for Eon mode when the job starts, then select UP nodes in the current subcluster for workers to connect to.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions