diff --git a/docs/developer/openmetrics.rst b/docs/developer/openmetrics.rst
index 31ac7f3f178904d3c811d6d0a47d0d84d233ee03..5b7278eb6fffcca4306b2ac0451ee2ff8fa5a397 100644
--- a/docs/developer/openmetrics.rst
+++ b/docs/developer/openmetrics.rst
@@ -62,21 +62,3 @@ source - using adequate values:
scheme: http
static_configs:
- targets: ['localhost:9091']
-
-
-Monitoring the node with metrics
---------------------------------
-
-Once the node is correctly set up to export metrics
-and those are collected by a `Prometheus server `_,
-you can graphically monitor your node with a `Grafana dashboard `_.
-
-Dashboards suited for Octez can be easily built with the `Grafazos `_ tool.
-Grafazos provides several ready-to-use dashboards for Octez on the `Grafazos packages page `__, as plain JSON files.
-Their sources are also available as `jsonnet `__ files, that can be adjusted to build customized dashboards, if needed:
-
-
-- ``octez-basic``: A basic dashboard with all the node metrics
-- ``octez-full``: A full dashboard with the logs and hardware data.
- This dashboard should be used with `Netdata `_ (for supporting hardware data) and `Promtail `_ (for exporting the logs).
-- ``octez-compact``: A compact dashboard that gives a brief overwiev of the various node metrics on a single page.
diff --git a/docs/index.rst b/docs/index.rst
index bf1225dafdcdaea0d78a17bbdff10a15b9f11cb2..81e3e01d5fdb5092db43a82a882622a6e339d3f1 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -116,6 +116,7 @@ in the :ref:`introduction `.
user/key-management
user/node-configuration
+ user/node-monitoring
user/versioning
user/snapshots
user/history_modes
diff --git a/docs/shell/validation.rst b/docs/shell/validation.rst
index afcbcfe654365c18a63161a4d43582fbd941e097..0c058488bc3d6714e79a8743f545748e94604ccf 100644
--- a/docs/shell/validation.rst
+++ b/docs/shell/validation.rst
@@ -78,6 +78,8 @@ communicating with each other via message passing. Workers are spawned
and killed dynamically, according to connected peers, incoming blocks
to validate, and active (test)chains.
+.. _chain_validator:
+
A *chain validator* worker is launched by the validator for each
*chain* that it considers alive. Each chain validator is responsible for
handling blocks that belong to this chain, and select the best head for
@@ -87,6 +89,8 @@ chain. Forking a chain is decided from within the economic protocol. In
protocol Alpha, this is only used to try new protocols before self
amending the main chain.
+.. _peer_validator:
+
The chain validator spawns one *peer validator* worker per connected
peer. The set of peer validators is updated, grown, or shrunk on the fly, according to the
connections and disconnections signals from the peer-to-peer component.
@@ -123,6 +127,8 @@ pipeline, or multipass) will interact with the distributed DB to get
the data they need (block headers and operations). When they have
everything needed for a block, they will call the *block validator*.
+.. _block_validator:
+
The *block validator* validates blocks (currently in sequence),
assuming that all the necessary data have already been retrieved from
the peer-to-peer network. When a block is valid, it will notify the
diff --git a/docs/user/node-monitoring.rst b/docs/user/node-monitoring.rst
new file mode 100644
index 0000000000000000000000000000000000000000..2dee27493f474746d0c68ad11005da21bae3081a
--- /dev/null
+++ b/docs/user/node-monitoring.rst
@@ -0,0 +1,260 @@
+Monitoring a Tezos Node
+=======================
+
+Monitoring the behavior of a Tezos node can be partially achieved by exploring the logs or,
+more efficiently, through the RPC server. The use of RPCs is detailed in :doc:`the RPC documentation <../developer/rpc>`
+and :doc:`the RPC references <../shell/rpc>`.
+
+Most practically, however, is to use Octez Metrics to gather information and statistics, which has been integrated directly into the node
+since Octez version 14. Users are now able to get metrics without using an external tool,
+such as `tezos-metrics `_ (which is now deprecated).
+The node now includes a server that registers the implemented metrics and outputs them for each received ``/metrics`` http request.
+So now you can configure and launch your node with a metrics exporter.
+
+
+Starting a node with monitoring
+-------------------------------
+
+Start
+~~~~~
+
+The node can be started with its metrics exporter with the option ``--metrics-addr`` which takes as a parameter ``:`` or ```` or ``:``.
+
+```` and ```` are respectively the address and the port on which to expose the metrics.
+By default, ```` is ``localhost`` and ```` is ``9932``.
+
+.. code-block:: shell
+
+ tezos-node run --metrics-addr=: …
+
+Note that it is possible to serve metrics on several addresses by using the option more than once.
+
+Configure
+~~~~~~~~~
+
+You can also add this configuration to your persistent configuration file through the command line:
+
+.. code-block:: shell
+
+ tezos-node config init --metrics-addr=: ...
+
+ #Or if the configuration file already exists
+ tezos-node config update --metrics-addr=: ...
+
+See :doc:`the documentation of the node configuration<./node-configuration>` for more information.
+
+A correct setup should write an entry in the logs similar to:
+
+::
+
+ - node.main: starting metrics server on :
+
+Octez Metrics
+-------------
+
+This section focuses on access to the metrics and their uses.
+More details on the metrics specifications are available :doc:`here <../developer/openmetrics>`
+
+Scraping Octez Metrics
+~~~~~~~~~~~~~~~~~~~~~~
+
+Once your node is correctly set up to export metrics, you can scrape them by querying the metrics server of your node with the request `/metrics`.
+
+Ex.:
+
+.. code-block:: shell
+
+ curl http://:/metrics
+
+You will be presented with the list of defined and computed metrics as follows:
+
+::
+
+ #HELP metric description
+ #TYPE metric type
+ octez_metric_name{label_name=label_value} x.x
+
+
+The metrics that can be exposed by the node can be listed with the command:
+
+.. code-block:: shell
+
+ tezos-node dump-metrics
+
+
+Version 14 of Octez exports metrics from various components of the node, namely:
+
+- :doc:`The p2p layer <../shell/p2p>`
+- :doc:`The store <../shell/storage>`
+- :doc:`The prevalidator <../shell/prevalidation>`
+- :ref:`The chain validator `
+- :ref:`The block validator `
+- :ref:`The peer validator `
+- The distributed database
+- :doc:`The RPC server <../shell/rpc>`
+- The node version
+
+Each exported metric has the following form::
+
+ octez_subsystem_metric{label_name=label_value;...} value
+
+Each metric name starts with ``octez`` as its namespace, followed by the a subsystem name, which is the section of the node described by the metric.
+It follows the OpenMetrics specification described `here `__
+
+A metric may provide labeled parameters which allow for different instances of the metric, with different label values.
+For instance, the metric ``octez_distributed_db_requester_table_length`` has a label name ``requester_kind`` which allows this metric to have one value for each kind of requester.
+
+::
+
+ octez_distributed_db_requester_table_length{requester_kind="block_header"} x
+ octez_distributed_db_requester_table_length{requester_kind="protocol"} y
+ ...
+
+Metrics provide information about the node in the form of a `gauge `_ that can increase or decrease (like the number of connections),
+a `counter `_ that can only increase (like the head level),
+or a `histogram `_ used to track the size of events and how long they usually take (e.g., the time taken by an RPC call).
+
+The label value is sometimes used to store information that can't be described by the metric value (which can only be a float). This is used for example by the ``octez_version`` metric that provides the version within the labels.
+
+.. note::
+
+ Most of the metrics are computed when scraped from the node. As there is no rate limiter, you should consider scraping wisely and adding a proxy for a public endpoint, to limit the impact on performance.
+
+.. _prometheus_server:
+
+Prometheus
+~~~~~~~~~~
+
+Scraping metrics gives you instant values of the metrics. For a more effective monitoring, you should create a time series of these metrics.
+
+We suggest using `Prometheus `_ for that purpose.
+
+Once installed, you need to add the scraping job to the configuration file.
+
+::
+
+ - job_name: 'tezos-exporter'
+ scrape_interval: interval s
+ metrics_path: "/metrics"
+ static_configs:
+ - targets: ['addr:port']
+
+Prometheus is a service, so you need to start it. Note that Prometheus can also scrape metrics from several nodes!
+
+.. code-block:: shell
+
+ sudo systemctl start prometheus
+
+.. _hardware_metrics:
+
+Hardware metrics
+~~~~~~~~~~~~~~~~
+
+In addition to node metrics, you may want to gather other information and statistics for effective monitoring, such as hardware metrics.
+
+For that purpose, we suggest using `Netdata `_.
+
+To install Netdata:
+
+.. code-block:: shell
+
+ bash <(curl -Ss https://my-netdata.io/kickstart.sh)
+
+Add the following at the end of ``/etc/netdata/app_groups.conf``
+
+.. code-block:: shell
+
+ tezos: tezos-node tezos-validator
+
+.. _filecheck:
+
+Optionally, you can enable storage monitoring with ``filecheck``.
+
+To do so, create a ``filecheck.conf`` file in ``/etc/netdata/go.d/`` and add::
+
+ jobs:
+ - name: octez-data-dir-size
+ discovery_every: 30s
+ dirs:
+ collect_dir_size: yes
+ include:
+ - '/path/to/data/dir'
+
+ - name: octez-context-size
+ discovery_every: 30s
+ dirs:
+ collect_dir_size: yes
+ include:
+ - '/path/to/data/dir/context'
+
+ - name: octez-store-size
+ discovery_every: 30s
+ dirs:
+ collect_dir_size: yes
+ include:
+ - '/path/to/data/dir/store'
+
+
+Then, you need to make sure that the ``netdata`` user has the correct read/write/execute permissions.
+This can be achieved by adding this user to your user's group, or by defining custom rules.
+
+To check that the setup is correct::
+
+ #Log as netdata user
+ sudo -u netdata -s
+
+ #Go to the plugin directory
+ cd /usr/libexec/netdata/plugins.d/
+
+ #Run the debugger
+ ./go.d.plugin -d -m filecheck
+
+
+With a correct install, you should see lines such as::
+
+ BEGIN 'filecheck_octez-data-dir-size.dir_size' 9999945
+ SET '/path/to/data/dir/' = 48585735837
+ END
+
+Note, if you use filecheck for storage monitoring, you need to configure your dashboards accordingly. More details in the :ref:`Grafazos configuration section `.
+
+Dashboards
+----------
+
+Dashboards will take your node monitoring to the next level, allowing you to visualize the raw data collected with pretty, colorful graphs.
+
+Grafana
+~~~~~~~
+
+Dashboards can be created and visualized with `Grafana `_. Grafana can be installed by following `these instructions `_.
+
+Once installed and running, you should be able to reach the interface on port ``3000`` (you can change the port on the Grafana config file).
+
+Then you need to add the configured Prometheus server (see :ref:`Prometheus `) as a data source in ``Configuration/Data sources``.
+
+
+Grafazos
+~~~~~~~~
+
+You can interactively create your own dashboards to monitor your node, using the Grafana GUI. Alternatively Grafana allows you to import dashboards from JSON files.
+
+`Grafazos `_ generates JSON files that you can import into the Grafana interface.
+
+This tool generates the following dashboards:
+
+- ``octez-compact``: A compact dashboard that gives a brief overview of the various node metrics on a single page.
+- ``octez-basic``: A basic dashboard with all the node metrics.
+- ``octez-with-logs``: Same as basic but also displays the node's logs, with `Promtail `_ (for exporting the logs).
+- ``octez-full``: A full dashboard with the logs and hardware data. This dashboard should be used with `Netdata `_ (for supporting hardware data) in addition to Promtail.
+
+You can generate them from the sources, with your own configuration. Or you can use the JSON files, compatible with your node version found `here `_.
+
+.. _grafazos_configuration:
+
+The dashboards can be configured by setting environment variables before starting their generation (using ``make``).
+
+The available variables are:
+
+- ``BRANCH``: Used to specify the name of the branch of the node.
+- ``NODE_INSTANCE_LABEL``: Used to set the name of the node instance label in the metrics.
+- ``STORAGE_MODE``: To be set to ``filecheck`` if the :ref:`storage monitoring with filecheck ` is enabled.