Closed
Milestone
Apr 24, 2023–Jun 21, 2023
(2023Q2) Reducing overall CI workload in `{nomadic-labs,tezos}/tezos`
Snipped from https://hackmd.io/vdYP7DO5RDatOwG686ez3Q
Title : Reducing overall CI workload tezos/tezos
Description : The idea is to reduce the overall workload. This will save money, which can be used use faster hardware to speed up the marge-bot pipeline wall time. The major contributor to workload atm is opam tests, so most ideas are centered around those tests.
Estimated effort (in nb weeks) : 1 week
Associated KRs : CI runs consistently under 20 minutes and cost is not higher that the current one
Dependencies : Project 1
Part of : https://gitlab.com/groups/tezos/-/milestones/5
Metrics & Objectives
- Goal metrics (measured by):
- Projected Sequential Time per Pipeline (M1.3)
- Recorded Sequential Time per Pipeline (M2.3)
- Guardrail metric (measured by):
- Number of opam failures in scheduled pipelines (M3.1)
- Recorded / projected AWS cost (implied by the goal metric if sequential time is a sufficient proxy for cost and by M4)
Objectives:
- Goal metric:
- Reduce projected sequential time per pipeline to 230 minutes (from 422 minutes).
- Reduce recorded sequential time per pipeline to 250 minutes (from 422 minutes).
- Guardrail metric:
- The number of opam failures in scheduled pipelines should be similar to a time period before the our changes (-+ 5%).
- Recorded / projected AWS cost does not increase (this is implied by the goal metric if sequential time is a sufficient proxy for cost)
Task breakdown
| Task | Wall-time impact best/worst-case | Sequential impact | Cost impact potential | Complexity | Who could do it | Estimated effort (d) |
|---|---|---|---|---|---|---|
| Only run opam jobs for leaves/top-level packages | -45% | Savings | Medium | Pietro, Arvid | 3d | |
| (Only run opam jobs for rev-deps) | 1-27 minutes / 0 | Savings | Medium | Pietro, Arvid | 10d | |
| (Reduce number of opam packages) | 0 / 0 | Savings | Hard / tedious | Pietro, Arvid | ? | |
| (Use runtest image instead of prebuild) | ? / 0 | Savings | Easy | Arvid, Pietro | 2-3d | |
| (Cache _opam directory) | ? / ? | Savings | Easy | Arvid, Pietro | 1d |
As list:
-
Only run opam jobs for leaves/top-level packages (@abate) -
nomadic-labs/marge-bot#9 (closed): Guarantee final pipeline in marge-bot (@abate) -
(Use runtest image instead of prebuild) @(Arvid, Pietro) {2-3d}
Deliverables
- A pipeline configuration that runs less opam jobs
How the objective was computed
Nerdy details
- In the period 2023-03-19T06:00:43.623Z to 2023-04-19T06:35:09.999Z
- We had 473252 jobs in 4552 pipelines. A total sequential time of 32033 hours. Giving a Sequential time per pipeline of 422 minutes.
- 19157 hours in 63091 jobs were spent on opam jobs (~60%)
- We say that the necessary opam jobs are those that:
- Marge-bot launched
- That ran on master (scheduled pipelines)
- Or were one of the leaf jobs (definition fuzzy for the moment):
- opam:octez-accuser-PtLimaPt, opam:octez-accuser-PtMumbai, opam:octez-accuser-PtNairob, opam:octez-baker-PtLimaPt, opam:octez-baker-PtMumbai, opam:octez-baker-PtNairob, opam:octez-client, opam:octez-codec, opam:octez-node, opam:octez-protocol-compiler, opam:octez-proxy-server, opam:octez-signer, opam:octez-smart-rollup-client-PtLimaPt, opam:octez-smart-rollup-client-PtMumbai, opam:octez-smart-rollup-client-PtNairob, opam:octez-smart-rollup-node-PtLimaPt, opam:octez-smart-rollup-node-PtMumbai, opam:octez-smart-rollup-node-PtNairob
- By only running the necessary opam jobs, we go down to 17385 jobs in 5121 hours from 63091 jobs in 19157 hours, a 73% reduction of opam job time.
- In total, we go down from 473252 jobs to 348149 jobs (a 26% reduction) and from 32033 hours to 17463 hours (a 45% reduction). Giving a Sequential time per pipeline of 230 minutes.
- To count in evolutions from code base evolutions, we add a margin of 20%, so we aim for a reduction to 250 minutes from 422.
Loading
Loading
Loading
Loading