Inconsistent & limiting API for LFS files (uploading very large files via API)
Summary
The GitLab API currently works like this:
- Files commited with the API get transformed with LFS (appropriate files turn into LFS pointers)
- When trying to get those files, one ends up with the raw LFS pointer (#27892 (closed))
Our problem actually is that we want to commit the raw LFS pointer. We want to do this because it seems like it is the only viable way to commit large files programatically.
Steps to reproduce
- Get an LFS file via the API: https://gitlab.com/api/v4/projects/31930250/repository/files/test-files%2Fsmall-text.lfs.txt/raw
- Try to commit a file using the API with the exact same content (raw git lfs pointer)
curl --request POST 'https://gitlab.com/api/v4/projects/637/repository/commits' \
--header 'PRIVATE-TOKEN: token' \
--header 'Content-Type: application/json' \
--data-raw '{
"branch": "master",
"commit_message": "test lfs commit",
"actions": [
{
"action": "create",
"file_path": "test-files/from-api.lfs.png",
"content": "version https://git-lfs.github.com/spec/v1\noid sha256:bbae3ffcb0c86be9766c831a5ba2cc4f7704d5c1304e6783e57418e77795c01f\nsize 122\n"
}
]
}'
- Fetch it and notice that the hash changed (it was not committed as raw pointer, but instead converted) https://gitlab.com/api/v4/projects/31930250/repository/files/test-files%2Ffrom-api.lfs.txt/raw
Example Project
https://gitlab.com/fiws/lfs-test/-/tree/main/test-files
What is the expected correct behavior?
Ideally both endpoints function in the same way. That is either apply LFS transforms (seems like the more sane option) or always ignore LFS.
But we would like an option to bypass the LFS transform on upload, so that the upload behaves the same way as the get.
We mainly want this to upload large files programmatically without creating one huge JSON request.
Output of checks
This bug happens on GitLab.com
Possible fixes
GitLab should ideally provide a way to commit huge files into LFS without buffering everything in one JSON object. This would be a lot of work though.
We think a viable quick fix for this would be an extra parameter like skipLFS
that can be supplied when committing via the API. Example request content:
{
"branch": "main",
"commit_message": "test lfs (raw pointer) commit via API",
"actions": [
{
"action": "create",
"file_path": "test-files/from-api.lfs.txt",
"content": "version https://git-lfs.github.com/spec/v1\noid sha256:bbae3ffcb0c86be9766c831a5ba2cc4f7704d5c1304e6783e57418e77795c01f\nsize 122\n",
"skipLFS": true
}
]
}
The main logic for this seems to be here: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/lfs/file_transformer.rb#L33-43
We would offer to create an MR for this, if you approve this approach.
FYI: I am writing this issue on behalf of my employer hydra newmedia GmbH. We are a GitLab EE Starter customer.
/cc @guischdi