[go: up one dir, main page]

[DAC] Preemptively download missed pages when the Observer node starts from cold

Currently, if an Observer node joins a DAC late, it will have to load all pages of the payload (starting from root hash) on-demand from the Rollup through /get_missing_page rpc. When this rpc is calld, the Observer node will fetch pages by sending a request for the page to each committee members (flooding) and returning the first successful result. This poses a performance issue when the Observer node has to do this for all pages of a payload. (see benchmark results below)

We need a better mechanism for starting up cold. In theory, the rollup node could copy the reveal_data_dir of one of its peers to establish the right state in the Observer node. Whether this is a DAC problem or a Rollup problem is up for debate.

Benchmark results

Benchmark below tells us that performance is related to number of committee members and size of message both which are expected. When fetching a single page (4kb), flooding can handle at least 50 committee members. As expected, flooding performs poorly when polling large payloads and should not be relied on to kick-off an Observer node from cold for large payload

**Single page fetch (Target use case)**

Benchmark the time it takes to fetch a single page.

committee members time (seconds)
2 0.001536
5 0.002289
10 0.002946
20 0.003461
50 0.008432
100 fails - [error] Too many open files

Fetch all missing pages

Benchmark the time it takes to fetch a all pages from a root hash. This happens when the Observer node misses the published root hash (eg. starts cold)

size (mb) committee members ~time (s)
2 2 1.604237
2 3 1.846500
2 5 2.309726
2 10 3.203248
5 2 4.232832
5 3 4.547821
5 5 5.679550
5 10 fail - [error] Can't assign requested address
7 2 5.668716
7 3 6.690229
7 5 fail - [error] Can't assign requested address
10 2 ~8.047370
10 3 fail - [error] Can't assign requested address
Edited by Ryan Tan