Improve performance of index action for BranchesController under load to meet target and investigate the performance in a large monorepo project
Summary
Performance testing is showing that the BranchesController#index
controller and action is just over our target of 500ms under load:
* Environment: 50k
* Environment Version: 12.9.0-pre `8962f4f86f3`
* Option: 60s_1000rps
* Date: 2020-03-10
* Run Time: 57m 56.43s (Start: 20:54:19 UTC, End: 21:52:16 UTC)
* GPT Version: v1.2.3
NAME | RPS | RPS RESULT | TTFB AVG | TTFB P90 | REQ STATUS | RESULT
---------------------|--------|----------------------|-----------|----------------------|----------------|---------
web_project_branches | 100/s | 94.53/s (>48.00/s) | 594.17ms | 692.84ms (<1500ms) | 100.00% (>95%) | Passed
Detailed Stats:
█ Web - Project Branches Page
data_received.................: 500 MB 8.3 MB/s
data_sent.....................: 628 kB 10 kB/s
group_duration................: avg=1913.56ms min=521.25ms med=1953.73ms max=2961.47ms p(90)=2140.08ms p(95)=2316.86ms
http_req_blocked..............: avg=0.05ms min=0.00ms med=0.01ms max=27.97ms p(90)=0.01ms p(95)=0.01ms
http_req_connecting...........: avg=0.03ms min=0.00ms med=0.00ms max=16.72ms p(90)=0.00ms p(95)=0.00ms
http_req_duration.............: avg=595.57ms min=444.35ms med=572.67ms max=1565.78ms p(90)=694.44ms p(95)=839.18ms
http_req_receiving............: avg=1.36ms min=0.23ms med=1.33ms max=15.53ms p(90)=1.76ms p(95)=1.89ms
http_req_sending..............: avg=0.03ms min=0.02ms med=0.03ms max=0.24ms p(90)=0.05ms p(95)=0.07ms
http_req_tls_handshaking......: avg=0.00ms min=0.00ms med=0.00ms max=0.00ms p(90)=0.00ms p(95)=0.00ms
http_req_waiting..............: avg=594.17ms min=443.29ms med=571.36ms max=1564.09ms p(90)=692.84ms p(95)=838.09ms
✓ { endpoint:branches/all }...: avg=620.01ms min=485.44ms med=601.80ms max=1564.09ms p(90)=704.11ms p(95)=836.67ms
✓ { endpoint:branches }.......: avg=568.33ms min=443.29ms med=547.84ms max=1530.29ms p(90)=648.70ms p(95)=840.58ms
✓ http_reqs.....................: 5672 94.533149/s
✓ { endpoint:branches/all }...: 2836 47.266574/s
✓ { endpoint:branches }.......: 2836 47.266574/s
iteration_duration............: avg=1912.90ms min=0.17ms med=1953.70ms max=2961.49ms p(90)=2140.06ms p(95)=2316.88ms
iterations....................: 2832 47.199908/s
✓ successful_requests...........: 100.00% ✓ 5664 ✗ 0
vus...........................: 1 min=1 max=100
vus_max.......................: 100 min=100 max=100
Performance test against large monorepo project
Worth calling out, that recently we(Quality) ran performance test against large monorepo
project with these specs: 3.5 Gb Files, 12.6 Gb Trees, 75 Gb Blobs, 916k commits, 20k issues, 7k MRs and branches, 1.5k labels. Test results degrade significantly with the increased number of branches - 6701 in monorepo
vs 2815 in gitlabhq
:
* Environment: 50k_monorepo
* Environment Version: 12.9.0-pre `8962f4f86f3`
* Option: 60s_1000rps
* Date: 2020-03-10
* Run Time: 57m 56.43s (Start: 20:54:19 UTC, End: 21:52:16 UTC)
* GPT Version: v1.2.3
NAME | RPS | RPS RESULT | TTFB AVG | TTFB P90 | REQ STATUS | RESULT
---------------------|--------|----------------------|-----------|----------------------|----------------|---------
web_project_branches | 100/s | 68.47/s (>48.00/s) | 2511.84ms | 3823.73ms (<1500ms) | 100.00% (>95%) | FAILED³
Detailed Stats:
█ Web - Project Branches Page
data_received.................: 358 MB 6.0 MB/s
data_sent.....................: 493 kB 8.2 kB/s
group_duration................: avg=2626.10ms min=1017.08ms med=2602.70ms max=5567.91ms p(90)=3940.20ms p(95)=4409.26ms
http_req_blocked..............: avg=0.06ms min=0.00ms med=0.01ms max=20.02ms p(90)=0.01ms p(95)=0.03ms
http_req_connecting...........: avg=0.04ms min=0.00ms med=0.00ms max=13.84ms p(90)=0.00ms p(95)=0.00ms
http_req_duration.............: avg=2513.32ms min=897.98ms med=2492.52ms max=5540.81ms p(90)=3825.43ms p(95)=4253.51ms
http_req_receiving............: avg=1.45ms min=0.76ms med=1.46ms max=9.52ms p(90)=2.00ms p(95)=2.16ms
http_req_sending..............: avg=0.03ms min=0.01ms med=0.03ms max=0.28ms p(90)=0.05ms p(95)=0.07ms
http_req_tls_handshaking......: avg=0.00ms min=0.00ms med=0.00ms max=0.00ms p(90)=0.00ms p(95)=0.00ms
http_req_waiting..............: avg=2511.84ms min=896.25ms med=2491.05ms max=5539.69ms p(90)=3823.73ms p(95)=4252.26ms
✗ { endpoint:branches/all }...: avg=2464.21ms min=896.25ms med=2444.90ms max=5202.09ms p(90)=3782.14ms p(95)=4201.49ms
✗ { endpoint:branches }.......: avg=2559.51ms min=1004.63ms med=2541.78ms max=5539.69ms p(90)=3881.31ms p(95)=4352.26ms
✓ http_reqs.....................: 4108 68.466563/s
✓ { endpoint:branches/all }...: 2055 34.249948/s
✓ { endpoint:branches }.......: 2053 34.216615/s
iteration_duration............: avg=2624.84ms min=0.16ms med=2601.19ms max=5567.93ms p(90)=3939.71ms p(95)=4409.15ms
iterations....................: 2049 34.149948/s
✓ successful_requests...........: 100.00% ✓ 4098 ✗ 0
vus...........................: 1 min=1 max=100
vus_max.......................: 100 min=100 max=100
web_project_branches
test hits Gitaly significantly harder than before with all nodes jumping to around 75% in comparison with 20% for gitlabhq
:
Note: The monorepo project was inflated with data via API. Not sure if the number of branches is reasonable and typical, but it's probably worth investigation.
Steps to reproduce
- Check out the Performance Tool
- Run the specific test with the
run-k6
command. For example against the 50k environment you would run this following from the project root:./run-k6 -e environments/50k.json -o 60s_1000rps.json -t web_project_branches.js
. You will need an ACCESS_TOKEN for this endpoint as well. - If you're seeking to run the test against your own environment the Tool's documentation has details on how to achieve this.
- If you're willing to run the test against large
monorepo
- please useimport rake task
to import the project unless #208452 is closed.
What is the current bug behavior?
The results above show that the Branches page has a P90 of 692.84ms. This was tested on our 50k Reference Architecture with a RPS target of 1000/s. It targeted our own gitlabhq
project that has 2815 branches. More detailed specs: 800 Mb Files, 1.90 Gb Trees, 6.2 Gb Blobs, 150k commits, 7k issues, 3.6 MRs, 33 labels
What is the expected correct behavior?
As part of the new performance targets this endpoint is currently over our TTFB target and just falls into the ~S4 tier. Task is to improve the endpoint's performance to meet our performance target (<500ms).