Skip to content

Implement per-job single retry of the failed tests #738

@asirko-soft

Description

@asirko-soft

Background: Flaky tests are a known source of instability in our validation workflows, especially in the REPL and Darwin jobs. These transient failures (such as network errors or timeouts due to a slow runner) cause entire jobs to fail for reasons unrelated to the code changes, forcing developers to manually intervene. This wastes time and consumes unnecessary compute resources.

Proposal: Implement a selective, automated retry mechanism. Instead of failing the entire job on the first error, the workflow will capture the names of failed tests and re-run only those specific tests. The job will be marked as successful if the retry passes and will only fail if the same tests fail a second time. Flakiness data will be captured as a build artifact for long-term analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions