DAL Monitoring for resilience
The goal of this milestone is to have the capacity to track shards life-cycles from production to attestation inclusion.
implementation Plan
Teztale
see https://docs.google.com/document/d/1AjCsD_iYhGeOS3cSDod67-Q2lzNV3JVIRrAMD3IHKY0/edit?tab=t.0
-
Migration code toward new scheme
-
Integrate DAL shard assignments into Teztale (@gabriel.moise) (days)
- Shard assignments are understood by Teztale - !20049 (merged)
- Teztale archiver collects shard assignments from the L1 node and pushes them to the Teztale server - !20056 (merged)
- (Optional) Add converter and Json archiver helpers for DAL shards - !20059 (WIP - not a priority)
-
teztale-archiver fetch DAL data from L1 monitoring (days) @gabriel.moise
- slot attested by a given delegate (for each round)
- slot's commitment-publish operation seen in L1 block
- slot's commitment-publish operation seen in L1 node mempool (optional)
-
teztale-archiver fetch DAL data from DAL node (days)
- Seen shards, with timestamp
- trapped shard
-
Teztale server can store DAL related events (week)
-
Teztale-dataviz front-end can display DAL data (weeks)
-
POC of OTEL to Teztale (2 weeks)
- Some relevant OTEL events are emited by the DAL Node
-
Teztale-archiver can receive OTEL connection
Alternatively, OTEL could push events in a streamed RPC called by teztale - Teztale-archiver understand relevant OTEL events
Telemetry
-
All relevent events are emitted by OTEL
- slot injected in DAL node
- slot's commitment-publish operation seen in L1 block
- slot's shard injected seen in gossipsub
-
Telemetry are configurable - DAL node can be instructed to activate some telemetry
- An RPC can set the push address for OTEL
- Telemetries are activated by section at runtime
-
Relevent telemetries are understood by Teztale-archiver
the monitoring stack should track:
- slot injected in DAL node
- slot expanded in shards
- slot's commitment-publish operation seen in L1 node mempool
- slot's commitment-publish operation seen in L1 block
- slot's shard injected seen in gossipsub
- slot's shard attested (for each round)
- slot's attestation are finalized