From b680b7914614ae26f84200c22c84c76ace38861b Mon Sep 17 00:00:00 2001 From: Henri Philipps Date: Tue, 23 Mar 2021 18:53:32 +0100 Subject: [PATCH 1/2] add failure cases to rollback runbook --- runbooks/rollback-a-deployment.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/runbooks/rollback-a-deployment.md b/runbooks/rollback-a-deployment.md index 483586c9..fac8e3ec 100644 --- a/runbooks/rollback-a-deployment.md +++ b/runbooks/rollback-a-deployment.md @@ -135,3 +135,17 @@ directory. If no new files are added, it's safe to roll back. How to start a manual rollback pipeline is described in [Creating a new deployment for _rolling back_ GitLab](../general/deploy/gitlab-com-deployer.md#creating-a-new-deployment-for-rolling-back-gitlab) +# Failure Cases + +## Recoverable by Job Retry + +### Out of Disk Space + +If a deployment job fails because a node running out of disk space, clean up the +disk space on the node (around 3.5GiB are needed on root partition - 1GiB for +package download and 2.3GiB for extracting the package) and re-run the job. + +### Corrupted Package + +Remove the failing package on the node (`sudo apt-get clean` or check +`/var/cache/apt/archives/`) and re-run the job. -- GitLab From e3d9e99ac707beeb81c5f17197112cb269102d98 Mon Sep 17 00:00:00 2001 From: Henri Philipps Date: Tue, 23 Mar 2021 19:06:10 +0100 Subject: [PATCH 2/2] add pipeline trigger permission failure case --- runbooks/rollback-a-deployment.md | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/runbooks/rollback-a-deployment.md b/runbooks/rollback-a-deployment.md index fac8e3ec..539b90d2 100644 --- a/runbooks/rollback-a-deployment.md +++ b/runbooks/rollback-a-deployment.md @@ -137,15 +137,27 @@ How to start a manual rollback pipeline is described in # Failure Cases -## Recoverable by Job Retry - -### Out of Disk Space +## Out of Disk Space If a deployment job fails because a node running out of disk space, clean up the disk space on the node (around 3.5GiB are needed on root partition - 1GiB for package download and 2.3GiB for extracting the package) and re-run the job. -### Corrupted Package +## Corrupted Package Remove the failing package on the node (`sudo apt-get clean` or check `/var/cache/apt/archives/`) and re-run the job. + +## Failure to trigger Pipeline when manually re-running a Job + +When manually re-running a failed Job in the pipeline, the job will be executed +with your permissions instead of the bot permissions. You need to have +permissions to also trigger downstream multi-project QA pipelines, else you will +see failures like this in QA jobs: + +``` +Trigger::Qa -- Retrying pipeline trigger #1 -- +{:project=>"gitlab-org/quality/staging", :status=>#} +``` + +Only Delivery team members typically have those permissions. -- GitLab