Projects end up broken after archiving an ancient fork parent - caused by inconsistent ObjectPool references
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
I initially created this issue at gitaly#6874 (closed), but this repository seems to be the correct place to report an issue.
Summary
We are seeing some GitLab projects in our instance that are broken and can no longer be viewed in the GitLab UI. They show the following error, and file infos are not being loaded:
Looking at the repository integrity checks and manually running git fsck
results in the following error message:
Could not fsck repository: error: unable to normalize alternate object path: /var/opt/gitlab/git-data/repositories/@hashed/9d/95/9d95bba7023609ee9e3da95b119f27d8f0a7c2412c1c773010f6bd5b8cea0d94.git/objects/../../../../../@pools/ae/a9/aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470.git/objects
We see the same error (with different paths and pools) for 7x projects at the moment.
We checked the .git/objects/info/alternates
files in our affected projects, they are pointing to these pools:
$ cat /var/opt/gitlab/git-data/repositories/@hashed/9d/95/9d95bba7023609ee9e3da95b119f27d8f0a7c2412c1c773010f6bd5b8cea0d94.git/objects/info/alternates
../../../../../@pools/ae/a9/aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470.git/objects
The referenced pools do not exist on disk, which explains these errors. But here is where it gets strange.
When looking at the gitlab-rails database, these projects are not supposed to be connected to a pool:
$ gitlab-rails console
project = Project.find_by_id(1234)
project.pool_repository
=> nil
What is the current bug behavior?
We already spent some time looking into this issue, here is what I understand so far.
We updated our GitLab instance from v17.11.4 to v18.1.1 a few weeks ago, without any issues. That update included a security fix / change:
Now if you archive a project, the UnlinkForkService
will be called and ensure that the project no longer depends on a pool (by calling functions like DisconnectGitAlternates
and ultimately DeleteObjectPool
, if the repo is the last one in the pool).
Gitaly does not perform any checks when being asked to delete a pool, as documented here: https://gitlab.com/gitlab-org/gitaly/-/blob/master/proto/objectpool.proto#L44-50
This does make sense, since Gitaly can not know if any other project has a reference on it. This is the job of the GitLab database.
But it seems that in the past the references to pools were not properly synced between the GitLab database and the .git/objects/info/alternates
files on disk. We analyzed our instance and found something like following constellation:
---
title: GitLab v17.11.4
---
flowchart TB
A("composer-preview (@hashed/9d/95)") -- db --> Z("nil")
A -- disk --> B("ObjectPool (@pools/ae/a9)")
C("composer-prod (@hashed/08/a0)") -- disk --> B
C -- db --> B
We then archived the project composer-prod
, resulting in this setup:
---
title: GitLab v18.1.1 (after archiving composer-prod)
---
flowchart TB
A("composer-preview (@hashed/9d/95)") -- db --> Z("nil")
A -- disk --> B("ObjectPool (@pools/ae/a9) - Deleted!")
C("composer-prod (@hashed/08/a0)")
We have projects where there is a reference to a pool in the .git/objects/info/alternates
file, but the database does not know about it. When you now archive a project which has a "clean" connection to the pool (composer-prod
in this case), then GitLab will go ahead and delete the pool, even though there is still a project that has a reference to it in the .git/objects/info/alternates
file. This results in the broken projects we are seeing now.
We see this exact pattern for 5x projects.
Another strange pattern we are seeing is where ObjectPools are referencing other ObjectPools in their .git/objects/info/alternates
files. The resulting "tree" looks like this:
flowchart TB
A("build-go (@hashed/56/28)") -- db --> B("ObjectPool (@pools/43/97)")
B -- disk --> C("ObjectPool (@pools/d4/ee)")
A -- disk --> C
D("legacy-build-go (@hashed/82/2c)") -- disk --> C
D -- db --> C
The project build-go
has a reference to the ObjectPool @pools/43/97
in the database, but the .git/objects/info/alternates
file points to another ObjectPool @pools/d4/ee
. The "additional pool" itself also points to the other pool (in its own alternates
file). If you now archive legacy-build-go
, the the pool @pools/d4/ee
will be deleted and the project build-go
will end up broken.
What is the expected correct behavior?
All projects should have a consistent state between the database and disk, without getting broken when archiving a related project.
Questions
- Is there (or was there) a known issue in GitLab that caused the database and the
.git/objects/info/alternates
files to be out of sync? - Is there a way to fix the affected projects? E.g. can the database entries be added manually?
- Is it save to restore the deleted ObjectPools from a backup?
- Should there be ObjectPools referencing other ObjectPools?
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
System information
System: Debian 11
Current User: git
Using RVM: no
Ruby Version: 3.2.5
Gem Version: 3.6.9
Bundler Version:2.6.9
Rake Version: 13.0.6
Redis Version: 7.2.9
Sidekiq Version:7.3.9
Go Version: unknown
GitLab information
Version: 18.1.2
Revision: 98bf90e2827
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: PostgreSQL
DB Version: 16.8
URL: https://git.company.local
HTTP Clone URL: https://git.company.local/some-group/some-project.git
SSH Clone URL: git@git.company.local:some-group/some-project.git
Using LDAP: no
Using Omniauth: yes
Omniauth Providers: saml
GitLab Shell
Version: 14.42.0
Repository storages:
- default: unix:/var/opt/gitlab/gitaly/gitaly.socket
GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell
Gitaly
- default Address: unix:/var/opt/gitlab/gitaly/gitaly.socket
- default Version: 18.1.2
- default Git Version: 2.49.0.gl2
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab subtasks ...
Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 14.42.0 ? ... OK (14.42.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell-check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... yes
Number of Sidekiq processes (cluster/worker) ... 1/2
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Reply by email is disabled in config/gitlab.yml
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab App ...
Database config exists? ... yes
Tables are truncated? ... skipped
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Cable config exists? ... yes
Resque config exists? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... yes
Systemd unit files or init script exist? ... skipped (omnibus-gitlab has neither init script nor systemd units)
Systemd unit files or init script up-to-date? ... skipped (omnibus-gitlab has neither init script nor systemd units)
Projects have namespace: ...
18/1 ... yes
4/5 ... yes
[...]
23170/21014 ... yes
23149/21015 ... yes
Redis version >= 6.2.14? ... yes
Ruby version >= 3.0.6 ? ... yes (3.2.5)
Git user has default SSH configuration? ... yes
Active users: ... 1046
Is authorized keys file accessible? ... yes
GitLab configured to store new projects in hashed storage? ... yes
All projects are in hashed storage? ... yes
Checking GitLab App ... Finished
Checking GitLab subtasks ... Finished