Investigate using Redis to buffer Gitaly responses while streaming
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
When we're streaming Rapid Diffs from Gitaly the algorithm looks like this:
- Receive request for stream on the Rails side
- Send an RPC call to Gitaly to fetch diff blobs in batches (30 files in a single batch)
- Process files in that batch one by one: highlight, render to string, send back to client stream
- Once the batch is processed repeat step 2 until all the diffs are processed
This approach has two critical limitations:
- Each of the steps is sequential, meaning we can not fetch the next diff blobs batch while processing the current one
- Gitaly batch calls always trigger the full processing of the diff, even if we fetch a small portion of the diffs, which also introduces overhead
This makes the overall process take way too long than it could or should be. In order to solve this problem we could:
- Use a continuous Gitaly RPC stream to fetch all diff blobs (diff blobs are still processed in batches but it's all done in a single request)
- Send this Gitaly PRC call in a separate Ruby thread, then store diff blobs batches in Redis
- Every time a batch is processed communicate with the main Ruby thread to start processing this batch
We could decide later what to do with the diff blobs stored in Redis: keep them for a while to use as a cache or remove right away.
We could also start the Gitaly RPC call on the page controller but that could complicate things a lot.
Edited by 🤖 GitLab Bot 🤖