Skip to content

subprocesses failing quietly #205

@wcpettus

Description

@wcpettus

When a subprocess exits, there is no generic notification to the operator. While this isn't a bad design, in practice most of our subprocesses are not intended to be limited in duration and so this creates a trap where it takes a long time to notice that the subprocess we cared about isn't behaving as designed.

Typical design:

Things that could be done - either you make the restart more automatic (a) or make the crash more obvious.
(a) probably easiest in modifying subprocess mixin basic_control_target (or have a continuous control version) where on failing the is_alive check, it restarts the worker
(b) the strongest method would be to overwrite the ping functionality so it checks if the worker is alive (only works for a single level of worker); could also make the cleanup method spam a lot more errors

  • either way, could check the cleanup behavior of classes inheriting to ensure they put out sufficient exit information and harden against those failure modes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions