Context

Note: this MR is built on top of !2646 (merged) because I assumed !2646 (merged) would be merged first, but it does not really depend on it. Only the last two commits are actually part of this new MR.

!2646 (merged) introduces a shell script to split Tezt CI jobs automatically into 3 equal parts. Those 3 parts are equal in the number of tests but does not try to make them equal in duration. This new MR is a proof-of-concept for a potential replacement of !2646 (merged) where tests are split into jobs which are roughly equal in duration.

To achieve this, we add the following command-line options to Tezt:

--record which causes test results, including the time they took, to be recorded in a file at the end of a run;
--suggest-jobs which reads such a recorded file and uses the recorded time information to suggest a split into roughly equal jobs;
--job-count to specify how many jobs we are interested in;
--not-test which is the opposite of --test, i.e. it means "do not run this particular test".

The result of --suggest-jobs is formatted as follows: there is one line per job, and each line is a sequence of command-line arguments to be passed to Tezt. All jobs except the last one are specified using a sequence of --test selectors, and the last job is specified by a sequence of --not-test selectors which denote the negation of all other jobs. This means that if a new test is added and the job split is not updated, the new test is still run, in the last job.

Note that --record could be used for other purposes such as displaying test results again if we ever wanted to.

Also note that --job-count could also serve as a potential argument to run tests in parallel if we ever wanted to, a la make -j. In fact, --job-count can already be abbreviated as -j.

Improvements

On top of the above, this MR:

adds --loop-count in case we want to obtain records with average times of multiple test runs
makes it so that if a --test, --file or tag does not exist, Tezt warns but continues, instead of exiting with an error; this is so that we do not have to update the job split when we remove or rename a test
fixes shell_quote since it did not quote parentheses in some cases and we need that when using the result of --suggest-jobs in the YAML

Results

You can find an example split with 10 jobs here: example-tezt-job-split.txt

One issue is that the command lines are rather long, especially the last one, and this may not be supported by the shell or the CI? Also it is not very readable because of this. One way to improve this would be to store this split into a readable text file, commit the file to the repository, and then have Tezt read this file, taking the job index as a parameter.

One interesting result is that Tezt predicts that all jobs will last for almost exactly the same time. This is because of the heuristic that Tezt uses to dispatch tests into jobs: it sorts tests by length, and places the longest tests first, ending with the smallest tests. And it turns out that we have a good number of very fast tests (regression tests?) that Tezt can distribute to achieve almost equal job durations.

One drawback of the very simple dispatch heuristic is that tests are no longer grouped by themes (i.e. by files).

~~Another drawback is that if one removes a test, one needs to update the CI, otherwise Tezt will complain that the --test does not exist. (But adding a new test is not a problem.)~~ => fixed by only emitting a warning

Future Work

In itself this MR does not actually splits the CI jobs, one needs to add some script to generate the .gitlab-ci.yml from the results of --suggest-jobs. Those results are formatted in such a way that this should be very straightforward. But we have to decide whether the long lines are a problem. => done

~~This MR also depends on the fact that no two tests have the same title. If we adopt this we should add a runtime check about this.~~ => done

Manually testing the MR

Record a test run, then run --suggest-jobs:

dune exec tezt/tests/main.exe -- --record tezt-results
dune exec tezt/tests/main.exe -- --suggest-jobs tezt-results --job-count 5

(You can select a subset of the tests to make it run faster if you want.)

You can also test the script that updates the CI:

dune exec tezt/tests/main.exe -- --suggest-jobs tezt-results --job-count 3 | scripts/update_tezt_test.ml

Since you probably did not get exactly the same times as me, you will have a significant diff in .gitlab/ci/tezt.yml but it should only change the lists of tests.

Checklist

Document the interface of any function added or modified (see the coding guidelines)
Provide automatic testing (see the testing guide).
For new features and bug fixes, add an item in the appropriate changelog (docs/protocols/alpha.rst for the protocol and the environment, the Development Version section of CHANGES.md for everything else).
Select suitable reviewers using the Reviewers field below.

Edited Mar 31, 2021 by Romain

Tezt: add --suggest-jobs to help split CI jobs