Grafazos: Add file to generate profiling dashboard
What
This is the base MR for adding a new Grafana dashboard which will handle the panels generated from the prometheus data added by the PPX profiling effort.
Debate topics (which can/should be addressed in this MR):
-
Do we want a new
Grafanadashboard with this new information? I advocate for yes, because for this, a node with theTEZOS_PROFILING=profiling PROFILING_BACKEND="prometheus"node should be run, and I believe the existing running nodes used for observations should not be impacted by the profiling overhead. -
Is the dashboard design up to standard? It's the first time I am building one, so I am not sure how liked would be or how expressive is. Please note that this dashboard added by this MR contains a panel just for the
Storemodule, because we need to add moreprometheuslabels to other components. -
Should there be more panels? In this case I did not exaggerate with more, although there are much more in-detail profiling outputs. Let's see the ordinary
store_profileroutputs for aghostnetnode:
2024-11-26T11:21:44.189-00:00
BMNMbXBSzpPdiVvN5MXvfyHPejWVLhkignTaYBzMDCdkxCJ3CQL ............................ 1 5371.821ms 11% +1m10s706.923ms
compute_live_blocks .......................................................... 1 0.004ms 75%
store_block .................................................................. 1 0.446ms 102% +0.001ms
set_head ..................................................................... 1 1.366ms 107% +0.500ms
get_pred_block ............................................................. 1 0.002ms 467% +0.003ms
may_split_context .......................................................... 1 0.001ms 210% +0.005ms
finalize_set_head .......................................................... 1 1.348ms 107% +0.015ms
write_new_head ........................................................... 1 0.490ms 105% +0.617ms
write_new_target ......................................................... 1 0.001ms 105% +1.107ms
updating live blocks ..................................................... 1 0.168ms 101% +1.135ms
locked compute live blocks with cache .................................. 1 0.166ms 101% +0.002ms
compute live blocks with new head .................................... 1 0.164ms 100% +0.000ms
What I did was to showcase graphs only for the 3 big (Notice-level) categories (compute_live_blocks, store_block and set_head) and I thought that this would be decent enough. If we see some unexplained increase in the average time they take, then we can investigate more deeply in the profiling text logs.
-
Combining graphs: As you can see, I did not combine any graph yet, because I believe graphs should only be combined (for let's say measurements
xandy) whenxandyare two options of a process (for instance, an operation that is done on eitherx = attestationory = preattestationby a function operating on consensus operations). Is this a good approach?
Why
How
Manually testing the MR
To test the MR, please refer to the Monitoring an Octez node tutorial, it was my first step, too.
Then, you definitely need a node. What I used was a running node on ghostnet (which ran for a few days):
$ TEZOS_PPX_PROFILER=profiling PROFILING_BACKEND="prometheus" PROFILING="debug" ./octez-node run --data-dir ~/.tezos-node-ghostnet --metrics-addr localhost:9091
You should be able to see the metrics at this address: http://localhost:9091/metrics
For generating the Grafana dashboard, you need to:
$ cd ~/tezos/grafazos
$ NODE_INSTANCE_LABEL=instance make profiling
This will create the octez-profiling.json file (Grafana dashboard template), which, when imported in Grafana, should give you a nice-looking dashboard:
You can also check the other already implemented panels by running the make command without any argument, and this will build all the other .json files, and then you can import any of the files in a new Grafana dashboard.
Checklist
-
Document the interface of any function added or modified (see the coding guidelines) -
Document any change to the user interface, including configuration parameters (see node configuration) -
Provide automatic testing (see the testing guide). -
For new features and bug fixes, add an item in the appropriate changelog ( docs/protocols/alpha.rstfor the protocol and the environment,CHANGES.rstat the root of the repository for everything else). -
Select suitable reviewers using the Reviewersfield below. -
Select as Assigneethe next person who should take action on that MR
