[go: up one dir, main page]

CI: Add retry on system_failure to [oc.build:static-x86_64-linux-binaries]

Problem

The oc.build:static-x86_64-linux-binaries job occasionally fails when GKE preempts the node it's running on, which happens with spot instances. These system failures are not related to the code itself but to the GCP infrastructure, causing unnecessary pipeline failures and requiring manual retriggers.

Solution

Added retry capabilities to the oc.build:static-x86_64-linux-binaries compilation job with specific retry conditions for system failures:

~retry: {max = 2; when_ = [Stuck_or_timeout_failure; Runner_system_failure]}

This allows the job to automatically retry up to 2 times when GitLab detects a system failure or timeout, which are the typical symptoms of node preemption.

Benefits

  • More resilient CI pipeline that can recover from infrastructure issues
  • Reduced need for manual intervention
  • Consistent with retry pattern already used in other compute-intensive jobs
  • No changes to the compilation logic itself

Merge request reports

Loading