Increase interruption retries for GitHub workers
What does this MR do and why?
This MR increases the max_retries_after_interruption for some GitHub "stage" workers.
The setting is used by the sidekiq-reliable-fetcher gem to limit the number of times a job will be terminated and requeued. By default, this is 3.
When the Sidekiq process is asked to stop it will wait for a period of time for a running job to finish, before forcefully terminating and requeuing the job if it does not finish within this time. Normally Sidekiq will do this over and over for a job indefinitely, but sidekiq-reliable-fetcher limits this. This offers protection from jobs that can never be completed in time between Sidekiq restarts (for example, the time between deploys).
Many of our GitHub importer "stage" workers can work for a very long time in large GitHub imports, but can also pick up from where they left off when restarted. We see these workers sometimes being terminated more than 3 times, which prevents large GitHub imports from finishing. Because they resume their work exactly from where they left off, we can raise the max_retries_after_interruption limit to allow them more time to complete when processing very large imports.
In this MR, only those "stage" workers which resume their work exactly from where they left off are given the increased limit. These are (all workers are in Gitlab::GithubImport::Stage namespace):
ImportIssueEventsWorkerImportLfsObjectsWorkerImportPullRequestsReviewRequestsWorkerImportPullRequestsWorkerImportAttachmentsWorkerImportIssuesAndDiffNotesWorkerImportNotesWorkerImportPullRequestsMergedByWorkerImportPullRequestsReviewsWorker
These workers do not have the limit applied, as they do not fully resume their work when restarted (some skip already imported objects, but start their loop at the beginning again).
ImportBaseDataWorkerFinishImportWorkerImportCollaboratorsWorkerImportRepositoryWorkerImportProtectedBranchesWorker
How to set up and validate locally
- Apply the patch below to have
Gitlab::GitHubImport::Stage::ImportPullRequestsWorkertake 60 seconds to do work:diff --git a/app/workers/gitlab/github_import/stage/import_pull_requests_worker.rb b/app/workers/gitlab/github_import/stage/import_pull_requests_worker.rb index 50527dfa2d84..7f1ad8da589d 100644 --- a/app/workers/gitlab/github_import/stage/import_pull_requests_worker.rb +++ b/app/workers/gitlab/github_import/stage/import_pull_requests_worker.rb @@ -13,6 +13,12 @@ class ImportPullRequestsWorker # rubocop:disable Scalability/IdempotentWorker resumes_work_when_interrupted! + def perform + sleep 60 + end + # client - An instance of Gitlab::GithubImport::Client. # project - An instance of Project. def import(client, project) - Restart Sidekiq:
gdk restart rails-background-jobs - In a separate terminal window tail logs for the job:
gdk tail rails-background-jobs | grep ImportPullRequestsWorker - In a rails console, enable the flag:
Feature.enable(:github_importer_raise_max_interruptions) - In a rails console, queue the worker:
Gitlab::GithubImport::Stage::ImportPullRequestsWorker.perform_async - Wait until you see a new log come through for the worker and then stop Sidekiq:
gdk stop rails-background-jobs - Start Sidekiq again:
gdk start rails-background-jobs - Wait about 10 seconds until you see a new log come through, note the
"interrupted_count"will be incremented - Stop Sidekiq again.
- Repeat steps
6-83 more times, until"interrupted_count"is 4
On master, following the same steps with the same patch, you will not see a log where interrupted_count goes above 2.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #416777 (closed)