Skip to content

Enhancement: Make it possible to resume experiments if the server crashes #11

@some-rando-rl

Description

@some-rando-rl

A requirement for this will be to annotate checkpoints after resuming in such a way that indicates that they weren't produced in a single run.

Perhaps this annotation could be accomplished by incrementing a "run id" number, and maybe an optional (or required?) field that captures whether the previous run terminated cleanly or not. If we do include that additional field, it will need to be regarded as best-effort however, as there's no such thing as a tamperproof mechanism for determining whether or not there was a clean exit between runs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions