Failed when pushing a large LFS file (larger than 2GB) to remote server in case that LFS files are located on object storage
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
I have built a gitlab server cluster, and store the lfs files to a third party object storage. When commit a lfs file with size more than 2GB, the pushing proess will not finish with a success result, it seems that the git client keeps restart the pushing action when fails.
Steps to reproduce
- gitlabs rails is located in k8s, with docker image version: registry.gitlab.com/gitlab-org/build/cng/gitlab-webservice-ce:v16.11.1.
- store lfs to a third party object storage service with configure:
- secret: items: - key: connection path: objectstorage/object_store name: object-storage
- the secret looks like belowing after base64 decoding.
bucket: dev-gitlab-ha-registry-storage accesskey: gitlab secretkey: xxx region: xxx regionendpoint: "http://10.5.1.45:10000" v4auth: true pathstyle: true
provider: AWS
region: xxx
aws_access_key_id: xxx
aws_secret_access_key: xxx
aws_signature_version: 4
host: 10.5.1.45
endpoint: "http://10.5.1.45:10000"
path_style: true
enable_signature_v4_streaming: false
- create a repo, clone it to local pc.
- create a large lfs file, commit and push
dd if=/dev/zero of=largefile_4g.iso bs=1g count=4
git lfs track largefile_4g.iso
git add .gitattributes
git add largefile_4g.iso
git commit -m "Add disk image"
git push
- then the pushing process will not finish in success.
What is the expected correct behavior?
I expect the push will finish in success, just like pushing a smaller lfs file, e.g. 500MB.
Relevant logs and/or screenshots
By using tcpdump to capture the http request to object store service, I found the gitlab puma module try to upload a large file by using UploadPartCopy api. https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPartCopy.html
But it make a DELETE request to the source object before finishing the UploadPartCopy action, then the object storage returns a 404 result.
in the upper pic, We can see a DELETE request, the other PUT action is the UploadPartCopy request. This is reason why upload fails, we need up to 800+ UploadPartCopy request to handle a large file, but a DELETE action is done by gitlab-rails after part 205 in copied.