Projects end up broken after archiving an ancient fork parent - caused by inconsistent ObjectPool references
Hey there, I am not sure if this issue should be opened here or at https://gitlab.com/gitlab-org/gitlab - but since it closely related to ObjectPools, I decided to place it here. If that is wrong, please let me know and I will move it.
Issue
We are seeing some GitLab projects in our instance that are broken and can no longer be viewed in the GitLab UI. They show the following error, and file infos are not being loaded:
Looking at the repository integrity checks and manually running git fsck
results in the following error message:
Could not fsck repository: error: unable to normalize alternate object path: /var/opt/gitlab/git-data/repositories/@hashed/9d/95/9d95bba7023609ee9e3da95b119f27d8f0a7c2412c1c773010f6bd5b8cea0d94.git/objects/../../../../../@pools/ae/a9/aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470.git/objects
We see the same error (with different paths and pools) for 7x projects at the moment.
We checked the .git/objects/info/alternates
files in our affected projects, they are pointing to these pools:
$ cat /var/opt/gitlab/git-data/repositories/@hashed/9d/95/9d95bba7023609ee9e3da95b119f27d8f0a7c2412c1c773010f6bd5b8cea0d94.git/objects/info/alternates
../../../../../@pools/ae/a9/aea92132c4cbeb263e6ac2bf6c183b5d81737f179f21efdc5863739672f0f470.git/objects
The referenced pools do not exist on disk, which explains these errors. But here is where it gets strange.
When looking at the gitlab-rails database, these projects are not supposed to be connected to a pool:
$ gitlab-rails console
project = Project.find_by_id(1234)
project.pool_repository
=> nil
Analysis
We already spent some time looking into this issue, here is what I understand so far.
We updated our GitLab instance from v17.11.4 to v18.1.1 a few weeks ago, without any issues. That update included a security fix / change:
Now if you archive a project, the UnlinkForkService
will be called and ensure that the project no longer depends on a pool (by calling functions like DisconnectGitAlternates
and ultimately DeleteObjectPool
, if the repo is the last one in the pool).
Gitaly does not perform any checks when being asked to delete a pool, as documented here: https://gitlab.com/gitlab-org/gitaly/-/blob/master/proto/objectpool.proto#L44-50
This does make sense, since Gitaly can not know if any other project has a reference on it. This is the job of the GitLab database.
But it seems that in the past the references to pools were not properly synced between the GitLab database and the .git/objects/info/alternates
files on disk. We analyzed our instance and found something like following constellation:
---
title: GitLab v17.11.4
---
flowchart TB
A("composer-preview (@hashed/9d/95)") -- db --> Z("nil")
A -- disk --> B("ObjectPool (@pools/ae/a9)")
C("composer-prod (@hashed/08/a0)") -- disk --> B
C -- db --> B
We then archived the project composer-prod
, resulting in this setup:
---
title: GitLab v18.1.1 (after archiving composer-prod)
---
flowchart TB
A("composer-preview (@hashed/9d/95)") -- db --> Z("nil")
A -- disk --> B("ObjectPool (@pools/ae/a9) - Deleted!")
C("composer-prod (@hashed/08/a0)")
We have projects where there is a reference to a pool in the .git/objects/info/alternates
file, but the database does not know about it. When you now archive a project which has a "clean" connection to the pool (composer-prod
in this case), then GitLab will go ahead and delete the pool, even though there is still a project that has a reference to it in the .git/objects/info/alternates
file. This results in the broken projects we are seeing now.
We see this exact pattern for 5x projects.
Another strange pattern we are seeing is where ObjectPools are referencing other ObjectPools in their .git/objects/info/alternates
files. The resulting "tree" looks like this:
flowchart TB
A("build-go (@hashed/56/28)") -- db --> B("ObjectPool (@pools/43/97)")
B -- disk --> C("ObjectPool (@pools/d4/ee)")
A -- disk --> C
D("legacy-build-go (@hashed/82/2c)") -- disk --> C
D -- db --> C
The project build-go
has a reference to the ObjectPool @pools/43/97
in the database, but the .git/objects/info/alternates
file points to another ObjectPool @pools/d4/ee
. The "additional pool" itself also points to the other pool (in its own alternates
file). If you now archive legacy-build-go
, the the pool @pools/d4/ee
will be deleted and the project build-go
will end up broken.
Questions
- Is there (or was there) a known issue in GitLab that caused the database and the
.git/objects/info/alternates
files to be out of sync? - Is there a way to fix the affected projects? E.g. can the database entries be added manually?
- Is it save to restore the deleted ObjectPools from a backup?
- Should there be ObjectPools referencing other ObjectPools?
We have analyzed a lot of logs, restored a backup of our instance before the update and spent a lot of time trying to understanding ObjectPools, alternates
files and the different links. If there is any information I can provide, please ask.
I am looking forward for any help available, thanks!
Kind regards, Malte