diff --git a/docs/shell/storage.rst b/docs/shell/storage.rst index 8569c52fa3058084b298d2de58366c24345dc4a0..02de39f9d643d337033341c5faec2b02dd10d042 100644 --- a/docs/shell/storage.rst +++ b/docs/shell/storage.rst @@ -3,26 +3,26 @@ The storage layer ***************** This document explains the inner workings of the storage layer of the -Tezos shell. The storage layer is responsible for aggregating blocks +Octez shell. The storage layer is responsible for aggregating blocks (along with their respective ledger state) and operations within blocks (along with their associated metadata). It is composed of two -main components: the :ref:`store` and the -:ref:`context`. +main components: a :ref:`store component ` +providing storage abstractions for blockchain data such as blocks and operations; and the :ref:`context component ` providing storage abstractions for ledger states (also called contexts). .. _store_component: Store ##### -This component handles the on-disk storage of static objects such as +The store component is the :package:`tezos-store` package implemented in the :src:`src/lib_store` library. It handles the on-disk storage of static objects such as blocks, operations, block's metadata, protocols and chain data. The store also handles the chain's current state: current head, invalid blocks, active test chains, etc. The store component is designed to -handle concurrent accesses to the data. Both a mutex and a lockfile +handle concurrent accesses to the data. Both a mutex and a lock file are present to prevent concurrent access to critical sections. The store also provides an accessor to the :ref:`context` and handles its initialization, but it is not responsible to commit contexts -on-disk. This is done by the :doc:`validator` component. +on disk. This is done by the :doc:`validation toolchain `. The store is initialized using a :doc:`history mode<../user/history_modes>` that can be either *Archive*, *Full* or @@ -30,24 +30,35 @@ mode<../user/history_modes>` that can be either *Archive*, *Full* or pruned while the chain is growing. In *Full* mode, all blocks that are part of the chain are kept but their associated metadata below a certain threshold are discarded. In *Rolling* mode, blocks under a -certain threshold are discarded entirely. *Full* and *Rolling* may -take a number of additional cycles to increase or decrease that -threshold. +certain threshold are discarded entirely. The thresholds of *Full* and *Rolling* modes may +be varied by specifying a number of :ref:`additional cycles to keep `. + +The moments when data may be pruned are when a cycle is completed. +When this happens, the store performs two operations. +First, the block history is linearized by trimming branches in the completed cycle. +Secondly, the remaining blocks in the completed cycle (or just their metadata), and possibly their context (ledger state), can be pruned, according to the history mode. +Both operations are explained next. + +Trimming +******** .. _lafl: -To decide whether a block should be pruned or not, the store uses the +To notice when a cycle has completed, the store uses the latest head's metadata that contains the **last allowed fork -level**. This threshold specifies that the local chain cannot be -reorganized below it. When a protocol validation returns a changed -value for it, it means that a cycle has completed. Then, the store +level**. This specifies the point under which the local chain cannot be +reorganized. When a protocol validation operation returns a changed +value for this point, it means that a cycle has completed. Then, the store retrieves all the blocks from ``(head-1).last_allowed_fork_level + 1`` -to ``head.last_allowed_fork_level``, which contain all the blocks of a -completed cycle that cannot be reorganized anymore, and trims the -potential branches in the process to yield a linear history. +to ``head.last_allowed_fork_level``, which contain all the blocks of the +completed cycle, that cannot be reorganized anymore, and trims the +potential branches to yield a linear history. + +Pruning +******* -When an un-reorganizable former cycle is retrieved, it is then -archived in what is called the *cemented cycles*. This process is +When the complete (hence, un-reorganizable) cycle is retrieved, it is +archived with the *cemented cycles*. This process is called a **merge** and is performed asynchronously. Depending on which history mode is ran and on the amount of additional cycles, blocks and/or their associated metadata present in these cemented cycles may @@ -55,14 +66,30 @@ or may not be preserved. For instance, if the history mode is *Archive*, every block is preserved, with all its metadata. If it is *Full* with 5 additional cycles, all the cemented cycles will be present but only the 10 most recent cemented cycles will have some -metadata kept (that is, *5 + 5 = 10* as the -:ref:`PRESERVED_CYCLES` protocol parameter, which on -Mainnet is currently set to 5 cycles, forces the node to keep at least -these cycles). Finally, if it is set to *Rolling* with 0 additional -cycles, only 5 cycles (the :ref:`PRESERVED_CYCLES ` -ones) with metadata will be kept. - -The store maintains two specific variables, whose values depend on the +metadata kept (see details at :ref:`History_mode_additional_cycles`). +Older metadata is pruned. + +Starting with Octez v15, the store also triggers *context pruning* when a cycle is completed, after finishing the store trimming and cementing. +Thus, when pruning a block, its metadata and its context (ledger state associated to that block) are pruned as well. + +Other features +************** + +Note that after pruning metadata of some blocks, the store has the capability to reconstruct it +by replaying every block and operation present and repopulating the +context. Hence, it is possible to transform a `Full` store into an `Archive` one (see also :ref:`Switch_mode_restrictions`). + +It is also possible to retrieve a canonical representation of the +store and context for a given block (provided that its metadata are +present) as a :doc:`snapshot<../user/snapshots>`. + +The store also writes on disk the sources of protocols no longer active. +This allows to recompile them or even share them on the network if needed. + +Store variables +*************** + +The store maintains two specific variables related to the pruned data, whose values depend on the history mode: - The *caboose*, which represents the oldest block known by the @@ -73,34 +100,29 @@ history mode: - The *savepoint* which indicates the lowest block known by the store that possesses metadata. -The *checkpoint* is also a special value that indicates one block that +The *checkpoint* is another variable maintained by the store, that indicates one block that must be part of the chain. This special block may be in the future. Setting a future checkpoint on a fresh node before bootstrapping adds protection in case of eclipse attacks where a set of malicious peers will advertise a wrong chain. When the store reaches the level of a manually defined checkpoint, it will make sure that this is indeed the expected block or will stop the bootstrap. When the checkpoint is -unset or reached, the store will maintain the following invariant: +unspecified by the user, the store sets it to the :ref:`last allowed fork level `, each time this latter is updated. In any case, the store will maintain the following invariant: ``checkpoint ≥ head.last_allowed_fork_level``. -To access those values, it is possible, while the node is running, to -call the RPC ``/chains/main/checkpoint`` to retrieve the checkpoint, -savepoint, caboose and the history mode. +While the node is running, it is possible to +call the following RPCs to access the values of all these variables: -The store also has the capability to reconstruct its blocks' metadata -by replaying every block and operation present and repopulating the -context. Hence, transforming a `Full` store into a `Archive` one. - -It is also possible to retrieve a canonical representation of the -store and context for a given block (provided that its metadata are -present) as a :doc:`snapshot<../user/snapshots>`. - -Protocols no longer active are also written on-disk. +- the checkpoint: `GET /chains//levels/checkpoint `__ +- the savepoint `GET /chains//levels/savepoint `__ +- the caboose: `GET /chains//levels/caboose `__ +- the history mode: `GET /config/history_mode `__ Files hierarchy *************** -The store directory in the node's ```` is organized as follows: +The Store maintains data on disk in the +``store`` subdirectory of the node's ````, organized as follows: - ``/store/protocols/`` the directory containing stored protocols. @@ -111,7 +133,7 @@ The store directory in the node's ```` is organized as follows: - ``/store//`` the *chain_store_dir* directory containing the main chain store. -- ``/store//lock`` the lockfile. +- ``/store//lock`` the lock file. - ``/store//config.json`` the chain store's configuration as a JSON file. @@ -134,23 +156,20 @@ The store directory in the node's ```` is organized as follows: - ``/store//testchain/*/`` contains the stores for every encountered test chains throughout the network. The underlying hierarchy follows the same format as - described. + the *chain_store_dir* directory containing the main chain store, described above. .. _context_component: Context ####### -The context is a versioned key/value store that associates for each -block a view of its ledger state. The versioning uses concepts similar +The context component is the the :package:`tezos-context` package, implemented in the :src:`src/lib_context` +library. It is a versioned key/value store that associates to each +block a view of its ledger state. The :package-api:`on-disk context API ` exports versioning concepts similar to `Git `_. The current implementation is using -`Irmin `_ as a backend, and its API -is accessible via the abstractions provided by the ``lib_context`` -library. +`Irmin `_ as a backend. - -The abstraction provides generic accessors/modifiers: ``set``, -``get``, ``del``, etc. manipulating a concrete context object and +The API provides generic accessors/modifiers for manipulating a concrete context object and git-like commands: ``commit``, ``checkout`` to manipulate different context branches. @@ -161,21 +180,23 @@ block is stored in its header. When validated, a block's announced ``context hash`` is checked against our local validation result. If the two context hashes are different, the block is considered invalid. -A context is supposed to be accessed and modified using the protocols' -API. It may be through RPCs or via blocks application. Only the -resulting context of valid blocks application is committed on disk. +The context of a block can be accessed using the protocols' RPCs such as +`GET ../\ `__, and more specifically by RPCs under the path ``..//context``. + +The context of the blockchain is only modified by :doc:`blocks applications <../active/validation>`. Only the +contexts resulting from the application of valid blocks is committed on disk, by the validation toolchain. -It is possible to export a concrete context associated to a specific +It is possible to export to a file a concrete context associated to a specific block's ledger state. This feature dumps a canonical representation of -this ledger state that may be incorporated in a snapshot to expose a +this ledger state that may be incorporated in a :doc:`snapshot <../user/snapshots>`, exposing a minimal storage state. -Note that it is possible to enable logging for the context backend +Note that it is possible to enable :doc:`logging <../user/logging>` for the context backend using the ``TEZOS_CONTEXT`` environment variable. There are two possible values for this variable: ``v`` for ``Info`` logging and -``vv`` for ``Debug`` logging (warning, the ``Debug`` mode is very +``vv`` for ``Debug`` logging (warning: the ``Debug`` mode is very talkative). Additionally, this environment variable allows to tweak, -with care, some context parameters (using the standard +with care, the following context parameters (using the standard `TEZOS_CONTEXT="variable=value"` pattern, separating the items with commas such as `TEZOS_CONTEXT="v, variable=value"`):