[go: up one dir, main page]

CI: add merge train pipeline with auto-cancel on job failure

What

This enables auto_cancel:on_job_failure for merge train pipelines, such that a failed job in a merge train pipeline immediately fails the whole pipeline.

Why

Certain fast checks run very early in the merge train pipelines, such as commit title checks or formatting checks. However, they run in parallel with slower parts of the pipeline and do not block their completion. And GitLab will not consider the pipeline as failed until the full pipeline has terminated. All merge requests that follow in the train must wait for the full pipeline termination -- at which point they will none the less be restarted. If we have to restart them, we might as well do it as soon as possible, hence this MR.

Consider for instance, https://gitlab.com/tezos/tezos/-/pipelines/1387619058 from MR !14261 (merged).

  • It terminated after 51 minutes at 12:48 PM.
  • But some of its jobs failed earlier, at 12:27 PM.

On top of it ran the pipeline for !14152 (merged), with the following pipeline: https://gitlab.com/tezos/tezos/-/pipelines/1387639866.

image

Due to the above failures, !14152 (merged)'s pipeline was canceled and restarted, but this happened only at 12:52 PM.

With the setting I propose, the restart would've happened at 12:27 instead, and much compute would've been saved.

How

By adding the appropriate types for CIAO, and allow the registration of pipelines with this setting. I create a new pipeline for merge trains, which is just as before, but with the new workflow setting.

Manually testing the MR

A bit hard to test without merging. I'll try to come up with something.

A small test, just to demonstrate the functionality in a single pipeline, and also to test how it works with retries:

A pipeline that contains two jobs:

  • one job that fails after 10 seconds and retries once.
  • one job that runs in parallel with the first, and for a longer time (10 minutes)

We check that the auto_cancel cancels the second job after the first one has failed twice.

Results:

https://gitlab.com/nomadic-labs/arvid-tezos/-/pipelines/1387974434

image

In fact, it doesn't seem to work at all as I expected :C . The second job indeed starts canceling after the first has failed twice. However, it keeps "canceling" until the sleep has terminated.

Checklist

  • Document the interface of any function added or modified (see the coding guidelines)
  • Document any change to the user interface, including configuration parameters (see node configuration)
  • Provide automatic testing (see the testing guide).
  • For new features and bug fixes, add an item in the appropriate changelog (docs/protocols/alpha.rst for the protocol and the environment, CHANGES.rst at the root of the repository for everything else).
  • Select suitable reviewers using the Reviewers field below.
  • Select as Assignee the next person who should take action on that MR
Edited by Arvid Jakobsson

Merge request reports

Loading