-
Notifications
You must be signed in to change notification settings - Fork 461
Run the Functional Tests in parallel #1868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3ff37b9 to
1c0a86d
Compare
|
Hmm. Something in those numbers does not add up:
The discrepancy is quite stark, and I cannot quite explain it. Sure, each of the now 40 instead of 4 matrix jobs duplicates the setup (including "Install product"), but by my counting that only takes around 1.5 minutes. And 36 * 1.5 = 54 minutes is much, much smaller than the ~7 hours difference of total time. I cannot explain this discrepancy even when accounting for the difference of build times (4 and 6¼ vs 3 and 5 minutes for the Debug and Release builds, respectively), which likely indicates a busy neighbor problem where the runner VMs were slower during the parallelized run simply due to the fact that more people ran workflows during that time of day. |
Ooooh, here is something interesting. That page lists every job twice! I suspect a bug that double-counts where there has been a re-run. Maybe GitHub double-bills jobs that have been skipped as part of a "re-run failed jobs" because they had not failed in the first place? The only job for which two different times are posted is "Functional Tests (Debug, x86_64, 5)", which makes sense because that one was the failed job in the first attempt: 6m15s vs 6m22s, which is "same enough" for all practical purposes. So now the numbers make more sense, at least after dividing 10h30m55s by 2: 5h15m28s. The difference in total run time to the non-parallelized run is now more in the vicinity of 1h40m = 100 minutes, and dividing that by the 36 additional jobs means that each of the additional jobs adds about 3 minutes to the total run time (which is still quite a bit more than the 1.5 minutes I estimated, even accounting for every "Run functional tests" step adding 20 seconds before actually running the tests and 20 seconds afterwards), but the remaining difference can be chalked up to "noisy neighbors in busy times slowing everyone else down". |
|
One more data point to add: the current run (which just failed, as of time of writing, reports the following (5h 7m 49s): Jobs/total run time
|
1c0a86d to
2b6940e
Compare
|
I wonder if it's also time to drop the Debug flag for Actions, since that is adding a lot of noise to the output. |
This teaches a new command-line option to `GVFS.FunctionalTests.exe` that allows running only a slice of all tests at a time, via `--slice=<m>,<n>` where `m` is the zero-based index of the slice to run, and `n` is the total number of slices to split the tests into. The idea is to distribute the test cases evenly across the slices, so that running all slices in parallel will finish in about the same time as running all tests in a single slice. This is slightly tricky because the test cases of classes within the `EnlistmentPerFixture` namespace are known to rely on side effects of previous test cases within the same class. Therefore, all test cases of such a class must be run in the same slice. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Following the example of git/git, which runs the Windows tests in parallel to compensate for a very long run time, we split the Functional Tests into 10 buckets and run them in parallel. This is particularly useful in light of some flaky tests that frequently need to be re-run. That way, the cost of re-running a failed test is reduced by virtue of only having to re-run a _slice_ of the tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2b6940e to
e62ad85
Compare
It is. Yet it helped me debug things ;-) |
mjcheetham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit
.github/workflows/build.yaml
Outdated
| architecture: [ x86_64, arm64 ] | ||
|
|
||
| steps: | ||
| - name: Report success! (for easier identification of successful runs in the Checks Required for Pull Requests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need such a long task name? :) Why not just Success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.github/workflows/build.yaml
Outdated
| needs: [functional_test] | ||
|
|
||
| steps: | ||
| - name: Report success! (for easier identification of successful runs in the Checks Required for Pull Requests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the GitHub UI, it is possible to specify Checks that need to be successful before a PR can be merged. These "Checks" are actually the names of jobs in the workflow. It would be much better to specify a required _workflow_, but only workflow _jobs_ can be specified. The Required Checks are currently: - Functional Tests (Debug, arm64) - Functional Tests (Debug, x86_64) - Functional Tests (Release, arm64) - Functional Tests (Release, x86_64) So let's make sure that there are job names that match the recorded names and that indicate a successful workflow run. An alternative would be to add a single job with a defined name that depends on all the other jobs, and that can compensate for the missing GitHub feature to require a whole workflow instead of individual jobs to have run successfully. But that would interfere with other PRs that are currently in flight, and will therefore have to wait its turn in a separate PR. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
GitHub has a feature called "Require status checks before merging": https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/about-protected-branches#require-status-checks-before-merging What they actually mean by "status checks" is "successful workflow job runs". And what they _really_ actually mean is "_display labels_ of successful workflow job runs". That is, workflow job runs are identified by the label they display. Example: In VFSforGit, there is currently a "required status check" that reads "Functional Test (Debug, arm64)". This is the label that is displayed for the matrix job with vector {configuration: Debug, architecture: arm64} in the "functional_test" workflow job. This is quite restrictive! In VFSforGit, specifying those four matrix job labels is a _work-around_ for the _actual_ requirement, namely that the workflow defined in `build.yaml` succeeds. It just so happens that those four matrix jobs are the leaf jobs, i.e. when they all succeed, the workflow _run_ has succeeded. And vice versa, if the workflow run failed, at least one of those four matrix jobs must have failed or not even run. Now that I multiplied the matrix jobs even further by running the Functional Tests in parallel, the _display labels_ (and the number) of the matrix jobs has changed. As a consequence, to appease the "Require status checks before merging" rule, I had to add _another_ set of matrix jobs just to guarantee that the same four matrix job labels exist. This is silly, because those four matrix jobs are not needed at all for actually testing the code. They are just there to make GitHub happy. This commit prepares to change that. It adds a new workflow job that fits the bill "if the job succeeded, the workflow run must have succeeded as a hole, and vice versa". This new job will be made the only "required status check", once this here PR has been merged, and then we can remove the silly "duplicate" matrix jobs again. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
e62ad85 to
040240b
Compare
From time to time, the Functional Tests fail, and re-running "fixes" the failures, i.e. those tests are flaky. When that happens, we have to invest a lot of time into the re-run: currently 35 minutes for the x86_64 jobs and 1 hour 15 minutes for the ARM64 ones. That's unwelcome friction.
Let's follow the playbook of git/git which runs its test suite (which, due to the heavy over-use of Bash scripting, is very, very slow on Windows) in parallel on Windows, slicing up the tests so that the individual test slices take roughly the same amount of time to run.
This PR is stacked on top of #1866 and will need its (empty) tip commit to be dropped to force GitHub to update the PR correctly (i.e. no longer show the commits of #1866 once they are merged).