[go: up one dir, main page]

Skip to content

Repository clean up is inconsistent

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

We are currently moving to GitLab. Since migrating our customers projects is a good time to clean up repositories of garbage and migrate binaries to LFS we use git-filter-repo to do that after a migration.

We are strictly following this guide: https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html

Our process of migration is as follows:

  • Import the project from bitbucket
  • Export the project
  • run git-filter-repo for analyzing big files that were deleted and binaries that can be moved to LFS
  • run git-filter-repo for cleaning all files that are not needed in the history
    • back up commit-maps
  • run git-filter-repo for rewriting all authors that put passwords or private emails in their name or email
    • back up commit-maps
  • run git-filter-repo for injecting .gitattributes with proper lfs params as first commit
    • back up commit-maps
  • run git lfs migrate import --everything --verbose --fixup --object-map="$BACKUP_DIR/4_lfs_maps"
    • rewrite 4_lfs_maps to use spaces instead of commata for later uploading
  • now push lfs, refs/heads/, refs/tags/ and refs/replace/*
  • wait 31 minutes
  • now upload commit maps in the order they were created

Now its completely random, either directly after uploading the first commit map, the repo size is reduced to the same size as my local bare repo. OR nothing happens, size doesn't change for any commit map.

This occurs randomly, if we re-import the repo from BitBucket and then follow the whole process again, it sometimes works again.

We usually execute this migration two times, the first time is a test migration to make the transition for the "official" migration as smooth as possible and the second time when the "official" migration happens outside of working hours so our devs don't have to wait hours until they are allowed to work again.

The problem this causes is that repository limits cannot be inherited from global because the global limit is 100MB and our repositories depending on their age often exceed the 200MB range because of lots of unnecessary data being pushed.
Backups also become unnecessarily bloated which in turn costs us more time to restore.

Premium customer with 132 users and counting, if that's of any relevance for getting support.

Steps to reproduce

  • Import a project from bitbucket or any git server
  • Export the project
  • run git-filter-repo for analyzing big files that were deleted and binaries that can be moved to LFS
  • run git-filter-repo for cleaning all files that are not needed in the history
    • back up commit-maps
  • run git-filter-repo for rewriting all authors that put passwords or private emails in their name or email
    • back up commit-maps
  • run git-filter-repo for injecting .gitattributes with proper lfs params as first commit
    • back up commit-maps
  • run git lfs migrate import --everything --verbose --fixup --object-map="$BACKUP_DIR/4_lfs_maps"
    • rewrite 4_lfs_maps to use spaces instead of commata for later uploading
  • now push lfs, refs/heads/, refs/tags/ and refs/replace/*
  • wait 31 minutes
  • now upload commit maps in the order they were created
  • repository size either changes or it doesn't

Example Project

That's difficult since I cannot upload our customers code to a public repo and it happens completely at random, one time it works, the other time it doesn't.

What is the current bug behavior?

Repository cleanup sometimes doesn't clean the repository.

What is the expected correct behavior?

Repository cleanup always cleans the repository.

Relevant logs and/or screenshots

Not sure where I can even check what happens when running a cleanup in GitLab.

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info
sudo docker-compose exec web  gitlab-rake gitlab:env:info

System information
System:
Proxy:          no
Current User:   git
Using RVM:      no
Ruby Version:   2.7.7p221
Gem Version:    3.1.6
Bundler Version:2.3.15
Rake Version:   13.0.6
Redis Version:  6.2.8
Sidekiq Version:6.5.7
Go Version:     unknown

GitLab information
Version:        15.8.1-ee
Revision:       c49deff6e37
Directory:      /opt/gitlab/embedded/service/gitlab-rails
DB Adapter:     PostgreSQL
DB Version:     12.12
URL:            https://[redacted]
HTTP Clone URL: https://[redacted]/some-group/some-project.git
SSH Clone URL:  git@[redacted]:some-group/some-project.git
Elasticsearch:  yes
Geo:            no
Using LDAP:     yes
Using Omniauth: yes
Omniauth Providers: 

GitLab Shell
Version:        14.15.0
Repository storages:
- default:      unix:/var/opt/gitlab/gitaly/gitaly.socket
GitLab Shell path:              /opt/gitlab/embedded/service/gitlab-shell

Results of GitLab application Check

Expand for output related to the GitLab application check

sudo docker-compose exec web gitlab-rake gitlab:check SANITIZE=true
Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 14.15.0 ? ... OK (14.15.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... yes
Number of Sidekiq processes (cluster/worker) ... 1/1

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... Server: ldapmain
LDAP authentication... Success
LDAP users with access to your GitLab server (only showing the first 100 results)
User output sanitized. Found 100 users of 100 limit.

Checking LDAP ... Finished

Checking GitLab App ...

Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Cable config exists? ... yes
Resque config exists? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Systemd unit files or init script exist? ... skipped (omnibus-gitlab has neither init script nor systemd units) Systemd unit files or init script up-to-date? ... skipped (omnibus-gitlab has neither init script nor systemd units) Projects have namespace: ... [redacted] Redis version >= 6.0.0? ... yes Ruby version >= 2.7.2 ? ... yes (2.7.7) Git user has default SSH configuration? ... yes Active users: ... 146 Is authorized keys file accessible? ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes Elasticsearch version 7.x-8.x or OpenSearch version 1.x ... yes (elasticsearch 7.17.3) All migrations must be finished before doing a major upgrade ... yes

Checking GitLab App ... Finished

Checking GitLab subtasks ... Finished

Possible fixes

Edited by 🤖 GitLab Bot 🤖