[go: up one dir, main page]

Etherlink: enable LTO on release

Context

Enabling LTO could enhance the performances on the execution side to the cost of a slightly higher compilation time. I think it's a small sacrifice for what we gain, should be nice especially on our deployed kernels (mainnet/testnets), but also when we execute things locally on our machines.

Found this source online as well: Rust-Wasm:Compiling with Link Time Optimizations. There would be two benefits to enabling LTO for our wasm kernels:

Not only will it make the .wasm smaller, but it will also make it faster at runtime! The downside is that compilation will take longer.

Benchmarking

Ethereum test-suite

/!\ Don't forget to recompile the evm-evaluation-assessor when switching branches /!\

Benchmarking was done based on master:5f68e84bda2749558d8ba0abb75483659bfe99c0. Tested locally on my machine.

Around 20K scenarios focused on VM execution.

Cmd:

evm-evaluation-assessor --eth-tests <eth-tests-paths> -o <output-file> --resources ./etherlink/kernel_evm/evm_evaluation/resources -h -r

Without LTO:

30,56s user 0,38s system 97% cpu 31,812 total

With LTO:

24,26s user 0,32s system 99% cpu 24,587 total

CI

oc.build_kernels

The impact on the build part of the CI seems low (+ 21 seconds).

Without LTO:

@jobs/7831551680

8 minutes 47 seconds

With LTO:

@jobs/7820412561

9 minutes 8 seconds

Ticks

This section is taken from my comment @ !14933 (comment 2115160824).

Imbricated calls

Cmd:

> node etherlink/kernel_evm/benchmarks/scripts/benchmarks/bench_imbricated_calls.js > imbricated-calls.json 
> ./octez-smart-rollup-wasm-debugger --kernel evm_kernel.wasm --inputs imbricated-calls.json --installer-config etherlink/config/benchmarking.yaml

Without LTO:

Starting the profiling until new messages are expected. Please note that it will take some time and does not reflect a real computation time.
------------------ Kernel Invocation ------------------
[Info] Storing block 0 at 1970-01-01T00:00:00Z containing 3 transaction(s) for 22303401 gas used.
Profiling result can be found in /tmp/wasm-debugger-profiling-2024-09-17T14:44:32.363-00:00.out
----------------------
Detailed results for a `kernel_run`:
%interpreter(decode): 3864530 ticks (6.51s)
%interpreter(link): 17 ticks (6.848us)
%interpreter(init): 64803871 ticks (18.628s)
kernel_run: 17523528 ticks (5.67s)

Full execution: 86191946 ticks (29.746s)
----------------------
Detailed results for a `kernel_run`:
%interpreter(decode): 3864530 ticks (5.958s)
%interpreter(link): 17 ticks (6.924us)
%interpreter(init): 64803871 ticks (18.857s)
kernel_run: 276048873 ticks (1min22s)

Full execution: 344717291 ticks (1min47s)
----------------------
Full execution with padding: 100000000000000 ticks

With LTO:

------------------ Kernel Invocation ------------------
[Info] Storing block 0 at 1970-01-01T00:00:00Z containing 3 transaction(s) for 22303401 gas used.
Profiling result can be found in /tmp/wasm-debugger-profiling-2024-09-17T14:35:23.125-00:00.out
----------------------
Detailed results for a `kernel_run`:
%interpreter(decode): 3675543 ticks (5.758s)
%interpreter(link): 17 ticks (6.705us)
%interpreter(init): 60782303 ticks (17.416s)
kernel_run: 12147149 ticks (3.434s)

Full execution: 76605012 ticks (26.609s)
----------------------
Detailed results for a `kernel_run`:
%interpreter(decode): 3675543 ticks (5.798s)
%interpreter(link): 17 ticks (11.702us)
%interpreter(init): 60782303 ticks (17.582s)
kernel_run: 252359957 ticks (1min14s)

Full execution: 316817820 ticks (1min37s)
----------------------
Full execution with padding: 100000000000000 ticks

We respectively save 9_586_934 and 27_899_471 ticks.

ERC20 token

Cmd:

> node etherlink/kernel_evm/benchmarks/scripts/benchmarks/bench_erc20tok.js > erc20tok.json 
> ./octez-smart-rollup-wasm-debugger --kernel evm_kernel.wasm --inputs erc20tok.json --installer-config etherlink/config/benchmarking.yaml

Without LTO:

------------------ Kernel Invocation ------------------
[Info] Storing block 0 at 1970-01-01T00:00:00Z containing 7 transaction(s) for 24024922 gas used.
Profiling result can be found in /tmp/wasm-debugger-profiling-2024-09-17T14:56:46.738-00:00.out
----------------------
Detailed results for a `kernel_run`:
%interpreter(decode): 3864530 ticks (6.717s)
%interpreter(link): 17 ticks (7.828us)
%interpreter(init): 64803871 ticks (20.749s)
kernel_run: 19662987 ticks (6.294s)

Full execution: 88331405 ticks (33.760s)
----------------------
Detailed results for a `kernel_run`:
%interpreter(decode): 3864530 ticks (6.577s)
%interpreter(link): 17 ticks (7.292us)
%interpreter(init): 64803871 ticks (20.998s)
kernel_run: 266444123 ticks (1min26s)

Full execution: 335112541 ticks (1min53s)
----------------------
Full execution with padding: 100000000000000 ticks

With LTO:

------------------ Kernel Invocation ------------------
[Info] Storing block 0 at 1970-01-01T00:00:00Z containing 7 transaction(s) for 24024922 gas used.
Profiling result can be found in /tmp/wasm-debugger-profiling-2024-09-17T14:56:22.778-00:00.out
----------------------
Detailed results for a `kernel_run`:
%interpreter(decode): 3675543 ticks (6.24s)
%interpreter(link): 17 ticks (7.281us)
%interpreter(init): 60782303 ticks (18.664s)
kernel_run: 13728584 ticks (4.252s)

Full execution: 78186447 ticks (28.940s)
----------------------
Detailed results for a `kernel_run`:
%interpreter(decode): 3675543 ticks (6.330s)
%interpreter(link): 17 ticks (7.404us)
%interpreter(init): 60782303 ticks (19.536s)
kernel_run: 248164797 ticks (1min21s)

Full execution: 312622660 ticks (1min47s)
----------------------
Full execution with padding: 100000000000000 ticks

We respectively save 10_144_958 and 22_489_881 ticks.

Seems pretty cool for just enabling a flag. :)

Edited by Rodi-Can Bozman

Merge request reports

Loading