Successfully scheduled project exports over 5 GB are missing from S3 bucket - multipart upload support required
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Insight
Scheduled project exports to an S3 bucket with this endpoint fail for projects greater than 5 GB. These scheduled project exports use presigned urls and work for projects smaller than 5 GB. These exports fail in the sense that they are successfully scheduled (the linked endpoint returns 202
) but the export never appears in the S3 bucket.
Supporting evidence
We observed that out of all of our projects we were exporting using the method described above, a subset was consistently missing from our S3 bucket. After talking to GitLab, it was revealed to us that the S3 upload operation that GitLab was running using the presigned url was failing with the following error from AWS:
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>EntityTooLarge</Code><Message>Your proposed upload exceeds the maximum allowed size</Message><ProposedSize>6353522230</ProposedSize><MaxSizeAllowed>5368709120</MaxSizeAllowed><RequestId>Redacted</RequestId><HostId>Redacted</HostId></Error>
We looked into where this MaxSizeAllowed
value of 5ish GB was coming from and found the following from this AWS documentation:
Depending on the size of the data that you're uploading, Amazon S3 offers the following options:
Upload an object in a single operation by using the AWS SDKs, REST API, or AWS CLI – With a single PUT operation, you can upload a single object up to 5 GB in size.
Upload a single object by using the Amazon S3 console – With the Amazon S3 console, you can upload a single object up to 160 GB in size.
Upload an object in parts by using the AWS SDKs, REST API, or AWS CLI – Using the multipart upload API operation, you can upload a single large object, up to 5 TB in size.
The multipart upload API operation is designed to improve the upload experience for larger objects. You can upload an object in parts. These object parts can be uploaded independently, in any order, and in parallel. You can use a multipart upload for objects from 5 MB to 5 TB in size. For more information, see Uploading and copying objects using multipart upload.
Our hypothesis was that GitLab was using a single PUT
operation and therefore running into the first limit. After looking at the GitLab source code here this appears to be the case.
Action
From the above AWS documentation excerpt, this 5 GB limit is only for uploads done with a single S3 PUT operation. If GitLab switched to using multipart uploads, this 5 GB limit would become a 5 TB limit.
Tasks
-
Assign this issue to the appropriate Product Manager, Product Designer, or UX Researcher. -
Add the appropriate Group
(such as~"group::source code"
) label to the issue. This helps identify and track actionable insights at the group level. -
Link this issue back to the original research issue in the GitLab UX Research project and the Dovetail project. -
Adjust confidentiality of this issue if applicable