Implement per-job single retry of the failed tests

Background: Flaky tests are a known source of instability in our validation workflows, especially in the REPL and Darwin jobs. These transient failures (such as network errors or timeouts due to a slow runner) cause entire jobs to fail for reasons unrelated to the code changes, forcing developers to manually intervene. This wastes time and consumes unnecessary compute resources.

Proposal: Implement a selective, automated retry mechanism. Instead of failing the entire job on the first error, the workflow will capture the names of failed tests and re-run only those specific tests. The job will be marked as successful if the retry passes and will only fail if the same tests fail a second time. Flakiness data will be captured as a build artifact for long-term analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement per-job single retry of the failed tests #738

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement per-job single retry of the failed tests #738

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions