Repository clean up is inconsistent
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
We are currently moving to GitLab. Since migrating our customers projects is a good time to clean up repositories of garbage and migrate binaries to LFS we use git-filter-repo to do that after a migration.
We are strictly following this guide: https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html
Our process of migration is as follows:
- Import the project from bitbucket
- Export the project
- run git-filter-repo for analyzing big files that were deleted and binaries that can be moved to LFS
- run git-filter-repo for cleaning all files that are not needed in the history
- back up commit-maps
- run git-filter-repo for rewriting all authors that put passwords or private emails in their name or email
- back up commit-maps
- run git-filter-repo for injecting .gitattributes with proper lfs params as first commit
- back up commit-maps
- run git lfs migrate import --everything --verbose --fixup --object-map="$BACKUP_DIR/4_lfs_maps"
- rewrite 4_lfs_maps to use spaces instead of commata for later uploading
- now push lfs, refs/heads/, refs/tags/ and refs/replace/*
- wait 31 minutes
- now upload commit maps in the order they were created
Now its completely random, either directly after uploading the first commit map, the repo size is reduced to the same size as my local bare repo. OR nothing happens, size doesn't change for any commit map.
This occurs randomly, if we re-import the repo from BitBucket and then follow the whole process again, it sometimes works again.
We usually execute this migration two times, the first time is a test migration to make the transition for the "official" migration as smooth as possible and the second time when the "official" migration happens outside of working hours so our devs don't have to wait hours until they are allowed to work again.
The problem this causes is that repository limits cannot be inherited from global because the global limit is 100MB and our repositories depending on their age often exceed the 200MB range because of lots of unnecessary data being pushed.
Backups also become unnecessarily bloated which in turn costs us more time to restore.
Premium customer with 132 users and counting, if that's of any relevance for getting support.
Steps to reproduce
- Import a project from bitbucket or any git server
- Export the project
- run git-filter-repo for analyzing big files that were deleted and binaries that can be moved to LFS
- run git-filter-repo for cleaning all files that are not needed in the history
- back up commit-maps
- run git-filter-repo for rewriting all authors that put passwords or private emails in their name or email
- back up commit-maps
- run git-filter-repo for injecting .gitattributes with proper lfs params as first commit
- back up commit-maps
- run git lfs migrate import --everything --verbose --fixup --object-map="$BACKUP_DIR/4_lfs_maps"
- rewrite 4_lfs_maps to use spaces instead of commata for later uploading
- now push lfs, refs/heads/, refs/tags/ and refs/replace/*
- wait 31 minutes
- now upload commit maps in the order they were created
- repository size either changes or it doesn't
Example Project
That's difficult since I cannot upload our customers code to a public repo and it happens completely at random, one time it works, the other time it doesn't.
What is the current bug behavior?
Repository cleanup sometimes doesn't clean the repository.
What is the expected correct behavior?
Repository cleanup always cleans the repository.
Relevant logs and/or screenshots
Not sure where I can even check what happens when running a cleanup in GitLab.
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
sudo docker-compose exec web gitlab-rake gitlab:env:info System information System: Proxy: no Current User: git Using RVM: no Ruby Version: 2.7.7p221 Gem Version: 3.1.6 Bundler Version:2.3.15 Rake Version: 13.0.6 Redis Version: 6.2.8 Sidekiq Version:6.5.7 Go Version: unknown GitLab information Version: 15.8.1-ee Revision: c49deff6e37 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 12.12 URL: https://[redacted] HTTP Clone URL: https://[redacted]/some-group/some-project.git SSH Clone URL: git@[redacted]:some-group/some-project.git Elasticsearch: yes Geo: no Using LDAP: yes Using Omniauth: yes Omniauth Providers: GitLab Shell Version: 14.15.0 Repository storages: - default: unix:/var/opt/gitlab/gitaly/gitaly.socket GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell
Results of GitLab application Check
Expand for output related to the GitLab application check
sudo docker-compose exec web gitlab-rake gitlab:check SANITIZE=true
Checking GitLab subtasks ...Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 14.15.0 ? ... OK (14.15.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successfulChecking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... yes
Number of Sidekiq processes (cluster/worker) ... 1/1Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Reply by email is disabled in config/gitlab.yml
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... Server: ldapmain
LDAP authentication... Success
LDAP users with access to your GitLab server (only showing the first 100 results)
User output sanitized. Found 100 users of 100 limit.Checking LDAP ... Finished
Checking GitLab App ...
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Cable config exists? ... yes
Resque config exists? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Systemd unit files or init script exist? ... skipped (omnibus-gitlab has neither init script nor systemd units) Systemd unit files or init script up-to-date? ... skipped (omnibus-gitlab has neither init script nor systemd units) Projects have namespace: ... [redacted] Redis version >= 6.0.0? ... yes Ruby version >= 2.7.2 ? ... yes (2.7.7) Git user has default SSH configuration? ... yes Active users: ... 146 Is authorized keys file accessible? ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes Elasticsearch version 7.x-8.x or OpenSearch version 1.x ... yes (elasticsearch 7.17.3) All migrations must be finished before doing a major upgrade ... yesChecking GitLab App ... Finished
Checking GitLab subtasks ... Finished