Dal_node: Add cache for shards storage for Attester profile
What
Closes #8006 (closed)
Change the storage mechanism for DAL Attester nodes, from on-disk storage to in-memory.
Why
Currently, a DAL node running in the Attester profile stores the shards it receives on-disk (as can be seen in the Shards module of the DAL Store). This implies I/O operations for reading and writing, which is not necessary, as the Attester DAL nodes are not supposed to hold on the shards for more than a few levels (as dictated by the attestation_lag). So, an in-memory cache mechanism makes more sense.
Experiments backing up the helpfulness of this change: logbook
How
By creating a new module Shards_cache, which has the same functionality as the old Shards module (now Shards_disk) and differentiating which one to use depending on the Profile_manager of the DAL node.
Manually testing the MR
- Manual testing
On ghostnet, running an Attester DAL node with profiling enabled:
Before:
...
2025-06-24T12:17:47.222-00:00
BLD7cyuU7GM6hrKeUiVLUjezmjeupuyuitZL5ezfY3CjXXbqAnz ............................ 1 3933.555ms 3%
shards_handler ............................................................... 15 16.812ms 87%
count_values ............................................................... 15 0.064ms 97%
save_and_notify ............................................................ 15 16.640ms 87%
value_exists ............................................................. 15 0.875ms 114%
write_value .............................................................. 15 15.638ms 85%
update_timing_shard_received ............................................... 15 0.038ms 98%
...
After:
...
2025-06-24T12:14:07.233-00:00
BLVbtp18jkRNMfvrtFsko9GaFJQNxwVtJog7hfHkgHTVwqTe4Ra ............................ 1 3991.242ms 4%
shards_handler ............................................................... 38 0.528ms 102%
count_values ............................................................... 38 0.093ms 101%
save_and_notify ............................................................ 38 0.256ms 100%
add shard ................................................................ 38 0.017ms 104%
find_opt ................................................................. 38 0.038ms 103%
initialise ............................................................... 2 0.010ms 100%
update_timing_shard_received ............................................... 38 0.060ms 101%
...
(hard to notice the differences here, but looks like an improvement)
-
tezt-cloudtesting
Command:
dune exec tezt/tests/cloud/main.exe -- DAL --stake 1,1 --producers 48 --log-file s48p_2b_cache --proxy --proxy-localhost --network sandbox --website --monitoring --prometheus --prometheus-export --grafana --tezt-cloud s48p-2b-cache --dockerfile-alias dal --keep-temp --process-monitoring -i --disable-shard-validation --ppx-profiling --ppx-profiling-backends txt
Before:
...
2025-06-24T09:04:23.536-00:00
BLZ4prx14v4uDaCUxTtqa17G1pCYpUStFvT4EWD7qvR8vRK8x9U ............................ 1 8211.062ms 102%
shards_handler ............................................................... 7943 1624.861ms 101%
count_values ............................................................... 7943 60.045ms 100%
save_and_notify ............................................................ 7943 1423.778ms 101%
value_exists ............................................................. 7943 221.362ms 102%
write_value .............................................................. 7943 1075.439ms 101%
update_timing_shard_received ............................................... 7943 39.081ms 100%
...
After:
...
2025-06-24T10:09:06.847-00:00
BLnLzcyMF22QJWUmRkBcpJTStXRT9L4jNZVdqJwtJ6D1Cyn1DGd ............................ 1 7952.329ms 89%
shards_handler ............................................................... 11140 464.671ms 100%
count_values ............................................................... 11140 57.775ms 100%
save_and_notify ............................................................ 11140 219.134ms 99%
add shard ................................................................ 11140 25.105ms 101%
find_opt ................................................................. 11140 33.883ms 97%
initialise ............................................................... 1656 6.693ms 99%
update_timing_shard_received ............................................... 11140 48.090ms 100%
...
We can see that the time is improved considerably.
Checklist
-
Document the interface of any function added or modified (see the coding guidelines) -
Document any change to the user interface, including configuration parameters (see node configuration) -
Provide automatic testing (see the testing guide). -
For new features and bug fixes, add an item in the appropriate changelog ( docs/protocols/alpha.rstfor the protocol and the environment,CHANGES.rstat the root of the repository for everything else). -
Select suitable reviewers using the Reviewersfield below. -
Select as Assigneethe next person who should take action on that MR