DAL: Add PPX profiler
What
Parent MR: !18326 (merged)
Why
Adding profiling outputs to the DAL node can help us identify bottlenecks, especially in the frequent tezt-cloud scenarios that we run.
How
Manually testing the MR
- Manual (running a
ghostnetDAL node)
TEZOS_PPX_PROFILER=profiling make
PROFILING=Debug PROFILING_BACKEND=txt ./octez-node run --data-dir ~/.tezos-node-ghostnet --rpc-addr 127.0.0.1:18733
./octez-dal-node config init --endpoint http://127.0.0.1:18733 --attester-profiles=<baker-key>
PROFILING=Debug PROFILING_BACKEND=txt ./octez-dal-node run
And I got the following results in ~/.tezos-dal-node/dal_profiler.txt:
...
2025-06-13T16:05:38.154-00:00
BLpJtMFQUwMVHYqGVxgeFbzJUZfL4exuhwDEJjNUCfitsovLKxY ............................ 1 3989.302ms 3%
new_finalized_head ........................................................... 1 11.428ms 54%
try_process_block .......................................................... 1 9.628ms 42%
process_block ............................................................ 1 9.626ms 42%
process_block_data ..................................................... 1 8.087ms 45%
publish_slot_data .................................................... 2 0.001ms 84%
block_info ........................................................... 1 1.649ms 47%
slot_headers ......................................................... 1 1.701ms 20%
store_slot_headers ................................................... 1 0.237ms 105%
fetch_committee ...................................................... 1
get_gs_worker ........................................................ 1
dal_attestation ...................................................... 1 0.001ms 105%
update_selected_slot_headers_statuses ................................ 1 0.294ms 112%
remove_unattested_slots_and_shards ................................... 1 1.042ms 120%
get_attestations ..................................................... 1 0.965ms 29%
check_attesters_attested ............................................. 1 2.177ms 19%
inject_entrapment_evidences .......................................... 1 0.003ms 70%
2025-06-13T16:05:42.143-00:00
BLQikHQ5RHMyvMNySJoG17WRQXFTncNYQsUeCgewE7FfBh3vtzE ............................ 1 4161.257ms 3%
...
- TEZT-CLOUD (DAL scenario)
Command used:
dune exec tezt/tests/cloud/main.exe -- DAL --stake 1,1 --producers 32 --log-file ghost_32p_prof --proxy --proxy-localhost --network ghostnet --website --monitoring --prometheus --prometheus-export --grafana --tezt-cloud ghost-32p-prof --dockerfile-alias dal --keep-temp --process-monitoring -v --fundraiser edsk3nd3m5Zb31fbchqAvWVUbeF4jH7kdcs6mRCA7rFzYt4kopSyZP --ppx-profiling
PRODUCER result:
...
2025-06-13T16:18:28.343-00:00
BLLop3WUGHHGrQSB9LEqKdBfLGksNPNMFFUi2gFk2ESueiDMhsF ............................ 1 4552.647ms 47%
new_finalized_head ........................................................... 1 105.757ms 85%
try_process_block .......................................................... 1 100.857ms 85%
process_block ............................................................ 1 100.849ms 85%
process_block_data ..................................................... 1 94.270ms 87%
publish_slot_data .................................................... 17 41.625ms 100%
block_info ........................................................... 1 14.786ms 50%
slot_headers ......................................................... 1 8.682ms 31%
store_slot_headers ................................................... 1 0.774ms 95%
fetch_committee ...................................................... 1 0.002ms 105%
get_gs_worker ........................................................ 1 0.002ms 158%
dal_attestation ...................................................... 1 0.004ms 75%
update_selected_slot_headers_statuses ................................ 1 0.545ms 100%
remove_unattested_slots_and_shards ................................... 1 23.703ms 106%
get_attestations ..................................................... 1 4.036ms 104%
check_attesters_attested ............................................. 1 0.004ms 105%
inject_entrapment_evidences .......................................... 1 0.007ms 116%
2025-06-13T16:18:32.895-00:00
BKxwMuCz9vB4RKbFM5zfhNxGU3PWkrYZojPvCTJUVypTzps5YWA ............................ 1 3520.444ms 61%
new_finalized_head ........................................................... 1 36.659ms 54%
try_process_block .......................................................... 1 31.791ms 47%
process_block ............................................................ 1 31.783ms 47%
process_block_data ..................................................... 1 26.996ms 50%
publish_slot_data .................................................... 17 0.206ms 100%
block_info ........................................................... 1 10.799ms 45%
slot_headers ......................................................... 1 8.133ms 34%
store_slot_headers ................................................... 1 1.240ms 102%
fetch_committee ...................................................... 1 0.002ms 105%
get_gs_worker ........................................................ 1 0.002ms 105%
dal_attestation ...................................................... 1 0.004ms 79%
update_selected_slot_headers_statuses ................................ 1 0.198ms 101%
remove_unattested_slots_and_shards ................................... 1 2.811ms 117%
get_attestations ..................................................... 1 3.507ms 28%
check_attesters_attested ............................................. 1 0.005ms 120%
inject_entrapment_evidences .......................................... 1 0.010ms 100%
2025-06-13T16:18:36.416-00:00
BMaMrgsNdMWXvPZwcXeiEiYFGkUR766PK9wBDqnq1V19gRqKCS1 ............................ 1 4153.561ms 51%
...
(one block it has ~100ms time for new_finalized_head and the next block ~40ms and it repeats this behaviour)
In the /baker-dal-node-0 folder I got the following dal_profiling.txt:
...
2025-06-13T16:21:58.371-00:00
BL2HskwXQ7Ar8ZVJXPYbaa8AYoSBAsJ1ZKskyVEKzehHHsDV8JA ............................ 1 3956.804ms 1%
new_finalized_head ........................................................... 1 72.993ms 37%
try_process_block .......................................................... 1 67.196ms 36%
process_block ............................................................ 1 67.189ms 36%
process_block_data ..................................................... 1 51.052ms 44%
publish_slot_data .................................................... 25 0.067ms 101%
block_info ........................................................... 1 30.735ms 40%
slot_headers ......................................................... 1 11.972ms 39%
store_slot_headers ................................................... 1 1.019ms 100%
fetch_committee ...................................................... 1 0.002ms 140%
get_gs_worker ........................................................ 1 0.003ms 105%
dal_attestation ...................................................... 1 0.005ms 120%
update_selected_slot_headers_statuses ................................ 1 0.618ms 96%
remove_unattested_slots_and_shards ................................... 1 1.616ms 100%
get_attestations ..................................................... 1 4.788ms 37%
check_attesters_attested ............................................. 1 0.088ms 101%
inject_entrapment_evidences .......................................... 1 0.007ms 98%
2025-06-13T16:22:02.328-00:00
BLi61d7NsVz5eoxMx76m5swEAtRnNVHzNYnwHYBJUu4jXDn6BdX ............................ 1 4043.669ms 1%
...
(all blocks are around the same values)
Checklist
-
Document the interface of any function added or modified (see the coding guidelines) -
Document any change to the user interface, including configuration parameters (see node configuration) -
Provide automatic testing (see the testing guide). -
For new features and bug fixes, add an item in the appropriate changelog ( docs/protocols/alpha.rstfor the protocol and the environment,CHANGES.rstat the root of the repository for everything else). -
Select suitable reviewers using the Reviewersfield below. -
Select as Assigneethe next person who should take action on that MR
Edited by Gabriel Moise