Skip to content

SimulationHead actor crashes on Adastra #48

@AdrienVannson

Description

@AdrienVannson

Doreisa works fine on Adastra with at most 8 nodes. From 16 nodes, the SimulationHead actor crashes with the following error.

(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff70c6f0121d0f54bc2a1ce18001000000 Worker ID: 97c2622d8c6eba3ae495ee60e950afa369451fc113cb126c206d548e Node ID: 1da6b0509c16d7937bd26f1fd3026bc0ab84319f0292c79d51eca14c Worker IP address: 10.80.5.118 Worker port: 10011 Worker PID: 3806763 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.

The reason is not clear: OOM is very unlikely, enough file descriptors are provided, the time limit is not reached, ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions