[go: up one dir, main page]

failures restoring backup to gitaly cluster

We have two customer tickets currently, showing similar issues around restoring backups to gitaly cluster environments:

This customer has a single-node GitLab instance in production, and has configured a new GitLab environment with gitaly cluster. When he attempts to restore a backup from the single-node production instance to the new gitaly cluster instance, he runs into RPC errors and sometimes read-only errors attempting to restore repositories. The customer reports the following errors:

The repositories are located on a gitaly cluster in /gitaly/repositories. This directory is empty prior to restore.

Many of the projects appear to restore okay, but several show grpc 14 errors and the restore itself eventually fails with the following error:

[root@ip-172-17-21-111 backups]# rake aborted! 
Gitlab::Git::CommandError: 9:repository is in read-only mode. 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/wraps_gitaly_errors.rb:15:in `rescue in wrapped_gitaly_errors' 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/wraps_gitaly_errors.rb:6:in `wrapped_gitaly_errors' 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/repository.rb:150:in `remove' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:218:in `restore_snippet_repository' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:61:in `block in restore_snippets' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:61:in `each' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:61:in `map' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:61:in `restore_snippets' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:44:in `restore' 
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/ee/backup/repositories.rb:12:in `restore' 
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:120:in `block (4 levels) in <top (required)>' 
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:78:in `block (3 levels) in <top (required)>' 
/opt/gitlab/embedded/bin/bundle:23:in `load' 
/opt/gitlab/embedded/bin/bundle:23:in `<main>'

Caused by: 
GRPC::FailedPrecondition: 9:repository is in read-only mode. debug_error_string:{"created":"@1617041064.468415038","description":"Error received from peer ipv4:172.17.68.155:2305","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"repository is in read-only mode","grpc_status":9} 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client.rb:177:in `execute' 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client/call.rb:18:in `block in call' 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client/call.rb:55:in `recording_request' 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client/call.rb:17:in `call' 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client.rb:167:in `call' 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/gitaly_client/repository_service.rb:359:in `remove' 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/repository.rb:151:in `block in remove' 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/wraps_gitaly_errors.rb:7:in `wrapped_gitaly_errors' 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/repository.rb:150:in `remove' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:218:in `restore_snippet_repository' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:61:in `block in restore_snippets' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:61:in `each' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:61:in `map' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:61:in `restore_snippets' 
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/repositories.rb:44:in `restore' 
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/ee/backup/repositories.rb:12:in `restore' 
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:120:in `block (4 levels) in <top (required)>' 
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:78:in `block (3 levels) in <top (required)>' 
/opt/gitlab/embedded/bin/bundle:23:in `load' 
/opt/gitlab/embedded/bin/bundle:23:in `<main>' 
Tasks: TOP => gitlab:backup:repo:restore 
(See full trace by running task with --trace)

This customer reports similar behavior trying to restore in what he says is a gitaly/praefect environment. He doesn't mention the architecture of the instance he's doing the backup on.

The same errors «gitlab-backup restore»
[Failed] restoring api-common/cni (@hashed/eb/0c/eb0c9cdc0862653468dacc6a876a0c40e9d642c50f798bae1162fe27f18d482c)

Error 9:repository is in read-only mode. debug_error_string:{"created":"@1616683445.784983667","description":"Error received from peer ipv4:10.232.72.17:2305","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"repository is in read-only mode","grpc_status":9}

[Failed] restoring xxiabs/xxi (@hashed/ee/9d/ee9d527a0a6108477fc5c98cf2a00f65d38c8e8508c4d17c1c11b2441c78a2ec)

Error 14:Socket closed. debug_error_string:{"created":"@1616683501.107158798","description":"Error received from peer ipv4:10.232.72.17:2305","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Socket closed","grpc_status":14}

multiple runs of checks on all prefect hosts always show the same:

root@m1-devops-test-gitlab-praefect-1:~# sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss -virtual-storage default

Virtual storage: default

All repositories are writable!

root@m1-devops-test-gitlab-praefect-1:~# sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss -virtual-storage default -partially-replicated

Virtual storage: default

All repositories are up to date!