Expired
Milestone
Nov 27, 2024–Sep 30, 2025
Performance Monitoring Visualisation
Contributor: @gabriel.moise , @Killian-Delarue
Context
Building on the successful integration of the PPX Profiler (see milestone here), the goal now is to visualise profiling outputs and formulate a clear statement on the performance improvement of transitioning to OCaml 5. This will be achieved by:
- Leveraging
Prometheusoutputs for dedicatedGrafanadashboards as an initial step. - Progressing towards
OpenTelemetrytraces for more granular insights.
Deliverables and tasks:
-
Study the current landscape (ETA: 29 nov) : -
(1-2 days) Gather information on grafazosand how it can be used to generate an initialGrafanadashboard (Reference ongoing Slack thread) -
(hours) Present a Grafanadashboard MR that usesPrometheusoutputs added already by the newprofilingoutputs : !15785 (merged) -
(1 day) Merge the base Grafanadashboard
-
-
Add more Prometheustraces (ETA: 13 dec)-
(hours) Synchronise with the Performance monitoring team to decide on what we want to visualise in the dashboards -
Found bug with Prometheusmonitoring only if the parent is also modified (revert this decision - !15818 (merged)) -
(1 day) Add more Prometheusoutputs derived from the profiling data (example MR): !15814 (merged) -
(days) SYNC with infrastructure team to deploy the new dashboard(s) on an existing monitoring node - discussed with @Killian-Delarue -
initial job to check that building with profiling enabled is not failing in the CI - !15862 (merged) -
artifacts MR for profiling node - !15938 (merged) -
child job to create an octez-nodebinary with profiling enabled for prometheus - TBD (ongoing by @Killian-Delarue🙏 thanks a lot) - !15901 (merged) (based on the parent MR: !15896 (merged))
-
-
-
OpenTelemetry integration (ETA : ?? ) -
! This part is blocked because of the OCaml 5 workers, so there is no OpenTelemetrywork. -
(hours) SYNC with @picdc to determine the status of the OpenTelemetrywork -
(days) Create a new dashboard when there are OpenTelemetrytraces implemented for the profiling outputs -
(hours) SYNC with infrastructure team again to deploy the new dashboard as well
-
Loading
Loading
Loading
Loading