From f07205ccadc706cd21dfeb13f588826b46dd9bc7 Mon Sep 17 00:00:00 2001 From: Nic Volanschi Date: Mon, 23 Jan 2023 15:27:35 +0100 Subject: [PATCH 1/4] doc: fix and improve shell/storage.rst --- docs/shell/storage.rst | 87 +++++++++++++++++++++++------------------- 1 file changed, 48 insertions(+), 39 deletions(-) diff --git a/docs/shell/storage.rst b/docs/shell/storage.rst index 8569c52fa305..ab72c018232f 100644 --- a/docs/shell/storage.rst +++ b/docs/shell/storage.rst @@ -3,7 +3,7 @@ The storage layer ***************** This document explains the inner workings of the storage layer of the -Tezos shell. The storage layer is responsible for aggregating blocks +Octez shell. The storage layer is responsible for aggregating blocks (along with their respective ledger state) and operations within blocks (along with their associated metadata). It is composed of two main components: the :ref:`store` and the @@ -18,11 +18,11 @@ This component handles the on-disk storage of static objects such as blocks, operations, block's metadata, protocols and chain data. The store also handles the chain's current state: current head, invalid blocks, active test chains, etc. The store component is designed to -handle concurrent accesses to the data. Both a mutex and a lockfile +handle concurrent accesses to the data. Both a mutex and a lock file are present to prevent concurrent access to critical sections. The store also provides an accessor to the :ref:`context` and handles its initialization, but it is not responsible to commit contexts -on-disk. This is done by the :doc:`validator` component. +on disk. This is done by the :doc:`validator` component. The store is initialized using a :doc:`history mode<../user/history_modes>` that can be either *Archive*, *Full* or @@ -30,24 +30,29 @@ mode<../user/history_modes>` that can be either *Archive*, *Full* or pruned while the chain is growing. In *Full* mode, all blocks that are part of the chain are kept but their associated metadata below a certain threshold are discarded. In *Rolling* mode, blocks under a -certain threshold are discarded entirely. *Full* and *Rolling* may -take a number of additional cycles to increase or decrease that -threshold. +certain threshold are discarded entirely. The thresholds of *Full* and *Rolling* modes may +be varied by specifying a number of additional cycles to keep. + +The moments when data may be pruned are when a cycle is completed. +When this happens, the store performs two operations. +First, the block history is linearized by trimming branches in the completed cycle. +Secondly, the remaining blocks in the completed cycle, or just their metadata, can be pruned, according to the history mode. +Both operations are explained next. .. _lafl: -To decide whether a block should be pruned or not, the store uses the +To notice when a cycle has completed, the store uses the latest head's metadata that contains the **last allowed fork -level**. This threshold specifies that the local chain cannot be -reorganized below it. When a protocol validation returns a changed -value for it, it means that a cycle has completed. Then, the store +level**. This specifies the point under which the local chain cannot be +reorganized. When a protocol validation operation returns a changed +value for this point, it means that a cycle has completed. Then, the store retrieves all the blocks from ``(head-1).last_allowed_fork_level + 1`` -to ``head.last_allowed_fork_level``, which contain all the blocks of a -completed cycle that cannot be reorganized anymore, and trims the -potential branches in the process to yield a linear history. +to ``head.last_allowed_fork_level``, which contain all the blocks of the +completed cycle, that cannot be reorganized anymore, and trims the +potential branches to yield a linear history. -When an un-reorganizable former cycle is retrieved, it is then -archived in what is called the *cemented cycles*. This process is +When the complete (hence, un-reorganizable) cycle is retrieved, it is +archived with the *cemented cycles*. This process is called a **merge** and is performed asynchronously. Depending on which history mode is ran and on the amount of additional cycles, blocks and/or their associated metadata present in these cemented cycles may @@ -62,7 +67,18 @@ these cycles). Finally, if it is set to *Rolling* with 0 additional cycles, only 5 cycles (the :ref:`PRESERVED_CYCLES ` ones) with metadata will be kept. -The store maintains two specific variables, whose values depend on the +Note that after pruning metadata of some blocks, the store has the capability to reconstruct it +by replaying every block and operation present and repopulating the +context. Hence, it is possible to transform a `Full` store into an `Archive` one. + +It is also possible to retrieve a canonical representation of the +store and context for a given block (provided that its metadata are +present) as a :doc:`snapshot<../user/snapshots>`. + +Store variables +*************** + +The store maintains two specific variables related to the pruned data, whose values depend on the history mode: - The *caboose*, which represents the oldest block known by the @@ -73,7 +89,7 @@ history mode: - The *savepoint* which indicates the lowest block known by the store that possesses metadata. -The *checkpoint* is also a special value that indicates one block that +The *checkpoint* is another variable maintained by the store, that indicates one block that must be part of the chain. This special block may be in the future. Setting a future checkpoint on a fresh node before bootstrapping adds protection in case of eclipse attacks where a set of malicious peers @@ -83,24 +99,17 @@ expected block or will stop the bootstrap. When the checkpoint is unset or reached, the store will maintain the following invariant: ``checkpoint ≥ head.last_allowed_fork_level``. -To access those values, it is possible, while the node is running, to -call the RPC ``/chains/main/checkpoint`` to retrieve the checkpoint, -savepoint, caboose and the history mode. - -The store also has the capability to reconstruct its blocks' metadata -by replaying every block and operation present and repopulating the -context. Hence, transforming a `Full` store into a `Archive` one. - -It is also possible to retrieve a canonical representation of the -store and context for a given block (provided that its metadata are -present) as a :doc:`snapshot<../user/snapshots>`. +While the node is running, it is possible to +call the RPC ``/chains/main/checkpoint`` to access the values of all these variables: the checkpoint, +the savepoint, the caboose, and the history mode. -Protocols no longer active are also written on-disk. +Protocols no longer active are also written on disk. Files hierarchy *************** -The store directory in the node's ```` is organized as follows: +The Store maintains data on disk in the +``store`` subdirectory of the node's ````, organized as follows: - ``/store/protocols/`` the directory containing stored protocols. @@ -111,7 +120,7 @@ The store directory in the node's ```` is organized as follows: - ``/store//`` the *chain_store_dir* directory containing the main chain store. -- ``/store//lock`` the lockfile. +- ``/store//lock`` the lock file. - ``/store//config.json`` the chain store's configuration as a JSON file. @@ -141,7 +150,7 @@ The store directory in the node's ```` is organized as follows: Context ####### -The context is a versioned key/value store that associates for each +The context is a versioned key/value store that associates to each block a view of its ledger state. The versioning uses concepts similar to `Git `_. The current implementation is using `Irmin `_ as a backend, and its API @@ -162,20 +171,20 @@ block is stored in its header. When validated, a block's announced the two context hashes are different, the block is considered invalid. A context is supposed to be accessed and modified using the protocols' -API. It may be through RPCs or via blocks application. Only the -resulting context of valid blocks application is committed on disk. +API. It may be through RPCs or via :doc:`blocks application <../active/validation>`. Only the +contexts resulting from application of valid blocks is committed on disk. -It is possible to export a concrete context associated to a specific +It is possible to export to a file a concrete context associated to a specific block's ledger state. This feature dumps a canonical representation of -this ledger state that may be incorporated in a snapshot to expose a +this ledger state that may be incorporated in a :doc:`snapshot <../user/snapshots>`, exposing a minimal storage state. -Note that it is possible to enable logging for the context backend +Note that it is possible to enable :doc:`logging <../user/logging>` for the context backend using the ``TEZOS_CONTEXT`` environment variable. There are two possible values for this variable: ``v`` for ``Info`` logging and -``vv`` for ``Debug`` logging (warning, the ``Debug`` mode is very +``vv`` for ``Debug`` logging (warning: the ``Debug`` mode is very talkative). Additionally, this environment variable allows to tweak, -with care, some context parameters (using the standard +with care, the following context parameters (using the standard `TEZOS_CONTEXT="variable=value"` pattern, separating the items with commas such as `TEZOS_CONTEXT="v, variable=value"`): -- GitLab From 06ef263a79c52a59d9efac2b76a7964debdc7f27 Mon Sep 17 00:00:00 2001 From: Nic Volanschi Date: Tue, 24 Jan 2023 09:31:10 +0100 Subject: [PATCH 2/4] doc: de-duplicate example with 5 additional cycles --- docs/shell/storage.rst | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/docs/shell/storage.rst b/docs/shell/storage.rst index ab72c018232f..af887afa4971 100644 --- a/docs/shell/storage.rst +++ b/docs/shell/storage.rst @@ -60,12 +60,7 @@ or may not be preserved. For instance, if the history mode is *Archive*, every block is preserved, with all its metadata. If it is *Full* with 5 additional cycles, all the cemented cycles will be present but only the 10 most recent cemented cycles will have some -metadata kept (that is, *5 + 5 = 10* as the -:ref:`PRESERVED_CYCLES` protocol parameter, which on -Mainnet is currently set to 5 cycles, forces the node to keep at least -these cycles). Finally, if it is set to *Rolling* with 0 additional -cycles, only 5 cycles (the :ref:`PRESERVED_CYCLES ` -ones) with metadata will be kept. +metadata kept (see details at :ref:`History_mode_additional_cycles`). Note that after pruning metadata of some blocks, the store has the capability to reconstruct it by replaying every block and operation present and repopulating the -- GitLab From 5923302ae774264fadc823b840d9e0c3a4aa14c3 Mon Sep 17 00:00:00 2001 From: Nic Volanschi Date: Tue, 24 Jan 2023 13:30:39 +0100 Subject: [PATCH 3/4] doc: explain the relationship between Store and Context; more structure --- docs/shell/storage.rst | 62 +++++++++++++++++++++++++++--------------- 1 file changed, 40 insertions(+), 22 deletions(-) diff --git a/docs/shell/storage.rst b/docs/shell/storage.rst index af887afa4971..7afe6c968479 100644 --- a/docs/shell/storage.rst +++ b/docs/shell/storage.rst @@ -6,15 +6,15 @@ This document explains the inner workings of the storage layer of the Octez shell. The storage layer is responsible for aggregating blocks (along with their respective ledger state) and operations within blocks (along with their associated metadata). It is composed of two -main components: the :ref:`store` and the -:ref:`context`. +main components: a :ref:`store component ` +providing storage abstractions for blockchain data such as blocks and operations; and the :ref:`context component ` providing storage abstractions for ledger states (also called contexts). .. _store_component: Store ##### -This component handles the on-disk storage of static objects such as +The store component is the :package:`tezos-store` package implemented in the :src:`src/lib_store` library. It handles the on-disk storage of static objects such as blocks, operations, block's metadata, protocols and chain data. The store also handles the chain's current state: current head, invalid blocks, active test chains, etc. The store component is designed to @@ -22,7 +22,7 @@ handle concurrent accesses to the data. Both a mutex and a lock file are present to prevent concurrent access to critical sections. The store also provides an accessor to the :ref:`context` and handles its initialization, but it is not responsible to commit contexts -on disk. This is done by the :doc:`validator` component. +on disk. This is done by the :doc:`validation toolchain `. The store is initialized using a :doc:`history mode<../user/history_modes>` that can be either *Archive*, *Full* or @@ -31,14 +31,17 @@ pruned while the chain is growing. In *Full* mode, all blocks that are part of the chain are kept but their associated metadata below a certain threshold are discarded. In *Rolling* mode, blocks under a certain threshold are discarded entirely. The thresholds of *Full* and *Rolling* modes may -be varied by specifying a number of additional cycles to keep. +be varied by specifying a number of :ref:`additional cycles to keep `. The moments when data may be pruned are when a cycle is completed. When this happens, the store performs two operations. First, the block history is linearized by trimming branches in the completed cycle. -Secondly, the remaining blocks in the completed cycle, or just their metadata, can be pruned, according to the history mode. +Secondly, the remaining blocks in the completed cycle (or just their metadata), and possibly their context (ledger state), can be pruned, according to the history mode. Both operations are explained next. +Trimming +******** + .. _lafl: To notice when a cycle has completed, the store uses the @@ -51,6 +54,9 @@ to ``head.last_allowed_fork_level``, which contain all the blocks of the completed cycle, that cannot be reorganized anymore, and trims the potential branches to yield a linear history. +Pruning +******* + When the complete (hence, un-reorganizable) cycle is retrieved, it is archived with the *cemented cycles*. This process is called a **merge** and is performed asynchronously. Depending on which @@ -61,15 +67,25 @@ or may not be preserved. For instance, if the history mode is *Full* with 5 additional cycles, all the cemented cycles will be present but only the 10 most recent cemented cycles will have some metadata kept (see details at :ref:`History_mode_additional_cycles`). +Older metadata is pruned. + +Starting with Octez v15, the store also triggers *context pruning* when a cycle is completed, after finishing the store trimming and cementing. +Thus, when pruning a block, its metadata and its context (ledger state associated to that block) are pruned as well. + +Other features +************** Note that after pruning metadata of some blocks, the store has the capability to reconstruct it by replaying every block and operation present and repopulating the -context. Hence, it is possible to transform a `Full` store into an `Archive` one. +context. Hence, it is possible to transform a `Full` store into an `Archive` one (see also :ref:`Switch_mode_restrictions`). It is also possible to retrieve a canonical representation of the store and context for a given block (provided that its metadata are present) as a :doc:`snapshot<../user/snapshots>`. +The store also writes on disk the sources of protocols no longer active. +This allows to recompile them or even share them on the network if needed. + Store variables *************** @@ -91,14 +107,16 @@ protection in case of eclipse attacks where a set of malicious peers will advertise a wrong chain. When the store reaches the level of a manually defined checkpoint, it will make sure that this is indeed the expected block or will stop the bootstrap. When the checkpoint is -unset or reached, the store will maintain the following invariant: +unspecified by the user, the store sets it to the :ref:`last allowed fork level `, each time this latter is updated. In any case, the store will maintain the following invariant: ``checkpoint ≥ head.last_allowed_fork_level``. While the node is running, it is possible to -call the RPC ``/chains/main/checkpoint`` to access the values of all these variables: the checkpoint, -the savepoint, the caboose, and the history mode. +call the following RPCs to access the values of all these variables: -Protocols no longer active are also written on disk. +- the checkpoint: `GET /chains//levels/checkpoint `__ +- the savepoint `GET /chains//levels/savepoint `__ +- the caboose: `GET /chains//levels/caboose `__ +- the history mode: `GET /config/history_mode `__ Files hierarchy *************** @@ -138,22 +156,20 @@ The Store maintains data on disk in the - ``/store//testchain/*/`` contains the stores for every encountered test chains throughout the network. The underlying hierarchy follows the same format as - described. + the *chain_store_dir* directory containing the main chain store, described above. .. _context_component: Context ####### -The context is a versioned key/value store that associates to each -block a view of its ledger state. The versioning uses concepts similar +The context component is the the :package:`tezos-context` package, implemented in the :src:`src/lib_context` +library. It is a versioned key/value store that associates to each +block a view of its ledger state. The :package-api:`on-disk context API ` exports versioning concepts similar to `Git `_. The current implementation is using -`Irmin `_ as a backend, and its API -is accessible via the abstractions provided by the ``lib_context`` -library. - +`Irmin `_ as a backend. -The abstraction provides generic accessors/modifiers: ``set``, +The API provides generic accessors/modifiers: ``set``, ``get``, ``del``, etc. manipulating a concrete context object and git-like commands: ``commit``, ``checkout`` to manipulate different context branches. @@ -165,9 +181,11 @@ block is stored in its header. When validated, a block's announced ``context hash`` is checked against our local validation result. If the two context hashes are different, the block is considered invalid. -A context is supposed to be accessed and modified using the protocols' -API. It may be through RPCs or via :doc:`blocks application <../active/validation>`. Only the -contexts resulting from application of valid blocks is committed on disk. +The context of a block can be accessed using the protocols' RPCs such as +`GET ../\ `__, and more specifically by RPCs under the path ``..//context``. + +The context of the blockchain is only modified by :doc:`blocks applications <../active/validation>`. Only the +contexts resulting from the application of valid blocks is committed on disk, by the validation toolchain. It is possible to export to a file a concrete context associated to a specific block's ledger state. This feature dumps a canonical representation of -- GitLab From c704a63f73f9498ed0f128a41779ee3a1f10f1b8 Mon Sep 17 00:00:00 2001 From: Nic Volanschi Date: Mon, 20 Feb 2023 15:35:27 +0100 Subject: [PATCH 4/4] doc: remove wrong examples of context accessors/modifiers --- docs/shell/storage.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/shell/storage.rst b/docs/shell/storage.rst index 7afe6c968479..02de39f9d643 100644 --- a/docs/shell/storage.rst +++ b/docs/shell/storage.rst @@ -169,8 +169,7 @@ block a view of its ledger state. The :package-api:`on-disk context API `_. The current implementation is using `Irmin `_ as a backend. -The API provides generic accessors/modifiers: ``set``, -``get``, ``del``, etc. manipulating a concrete context object and +The API provides generic accessors/modifiers for manipulating a concrete context object and git-like commands: ``commit``, ``checkout`` to manipulate different context branches. -- GitLab