diff --git a/docs/developer/error_management.rst b/docs/developer/error_management.rst new file mode 100644 index 0000000000000000000000000000000000000000..b8c15035dbae4a8edb64131d56f43183fde2ca15 --- /dev/null +++ b/docs/developer/error_management.rst @@ -0,0 +1,253 @@ +.. _error_management: + +Error management in the Tezos code-base +======================================= + +Managing errors is difficult. Moreover, OCaml is somewhat agnostic about it: +developers are free to adopt any of a number of different strategies. To +complicate the matter further, some of these strategies are hidden to the +type-system and the associated tooling. + +In Tezos, error management is further complicated by the sheer size of the +project. This is to be expected from a project that executes a dynamically +upgradable abstract protocol on top of a consensus algorithm distributed over a +peer-to-peer network. + +In Tezos, error management is further complicated by the necessary concurrency. + +To handle the complexity inherent to such a large project with a high degree of +concurrency, one simple tool is to provides guidelines. + +Below are guidelines about error management. They are intended for developers +who implement new features, fix bugs, or improve the existing code. They are +also intended for reviewers, especially as a material to reference when +discussing error management in any merge request. + + +Errors, failures, problems +-------------------------- + +In a program there is the most common scenario of execution and then there are +all the other possible cases. These other cases are more or less predictable, +more or less obvious, and more or less easy to recover from. These other cases +are errors, failures, problems, unforeseen events, complications, issues. + +There is not a set boundary between an execution with errors and an execution +without errors. That is, the same behaviour can be, depending on the context and +the point-of-view either normal or an error, or somewhere in between. + +Similarly, there is a range of ways to handle all the abnormal behaviour +occurrences. + + +Exceptions, result, option +-------------------------- + +The OCaml language has native support for exceptions. You can raise an exception +with the keyword `raise` and you can catch an exception with the +`try`-`with` construction. + +``` +try + … + raise Not_found + … +with + | Not_found -> … +``` + +Exceptions offer an efficient way to break the regular control-flow of programs. +However, there is no support for exception in the type-system. Specifically, the +type of functions in the signatures bears no indication of whether the function +can raise exceptions or attempts to catch some. + +The OCaml community at large has adopted, in different projects, different +strategies for handling errors without exceptions. One of the way is to use the +`option` type when a function has no value to return. Typically, this applies to +all the functions that query elements in a data structure. + +``` +val retreive : 'a t -> key -> 'a option +``` + +Returning an option (rather than raising `Not_found`, `Invalid_argument`, or +another exception) documents the possible failure mode and forces the caller to +take it into account. + +The option type is `type 'a option = Some of 'a | None`. Failures carry no +information. An alternative to option is the result type: +`type ('a, 'b) result = Ok of 'a | Error of 'b`. When returning a value of this +type, a function can attach a payload to carry information about the failure. + +Whilst the `result` type can be instantiated with any type of error payload, +some types are often associated with it: + +- A `string` payload can carry an error message. Such an error is generally not + meant to be caught, just logged. +- An `exn` (exception) payload can carry a more structured information about the + error. Note that it is note possible to infer, from the type information, + which exceptions might be transported by the `Error` variant. +- An extensible variant is similar to an `exn` payload. +- A polymorphic variant carries more information in the type, at the cost of a + possibly too verbose type error messages. + + +In Tezos we use an extensible variant. + + + +`error` +------- + +The extensible variant `error` is defined in the Error Monad module. It is used +throughout the code base as a serialisable error type to be carried by `result` +values. + +When you need to send an error to your caller in case of error, you have to +first extend the `error` type and then declare a serialiser for the new +constructor. + +``` +type Error_monad.error += + Invalid_schmilblik_in_cold_storage of { + storage_file_path: string; + offset: int; + } + +let () = + Error_monad.register_error_kind + … +``` + + +Traces +------ + +Errors are bundled into traces. This allows for more informative error messages +that indicate the flow of control that the program went through when the error +occurred. + +The most common trace construction is directly through the `error` and `fail` +functions: + +``` +val error : error -> ('a, error trace) result + +val fail : error -> ('a, error trace) result Lwt.t +``` + +These function build a trace containing a single error (the one passed as +argument) and bundle them for use within the error monad. Note that “error +monad” is a misnommer in Tezos, the error monad is actually a trace-result +monad.) + +In the rare case where you need to construct a trace by hand, you can use +`TzTrace.make : 'error -> 'error trace`. + +When you want to enrich a trace you simply use the `trace` or `record_trace` +functions. + +``` +TODO +``` + +When you do so, you push the previous errors down the trace and you place, on +top, the fresh error passed as argument. + +You should do so at the boundaries of the software component you are writing or +editing. E.g., when you call a function that is in a different module, you +should wrap the call into a `trace` call. + +When you receive a trace from a callee, you generally should either forward it +(this is achieved automatically with the monadic bind operators of the Error +Monad) or enrich it with an additional error. + +There are some rare, specific cases where you might attempt to recover: to +transform the current error into a success. Those cases are not numerous. +Generally, if you need to recover from such an error, you should probably not be +using error traces and prefer a local solution. + + + +Result monad +------------ + +First, a short aside on monads. Monads are just a pattern of code that allows to +thread some specific values through a computation. In our case, we thread the +result values through computation. We do so by replacing + +``` +let x = f () in +let y = g () in +h x y +``` + +with the following + +``` +f () >>= fun x -> +g () >>= fun y -> +h x y +``` + +As a first approximation, a monad is a slightly less compact way to bind values: +instead of `let` binding on the left of the expression, there's a `>>= fun` +binding on the right. + +TODO: link to a more complete introduction to monads. + + +In the result monad, we thread `result` values. Those values can represent +success (`Ok`) in which case the computation should continue or failure +(`Error`) in which case the computation should stop and the failure should be +returned as is. + +This is achieved through the clever definition of the binding operator `>>=`: + +``` +let ( >>= ) x f = match x with + | Ok v -> f v (* continue with f if Ok *) + | Error e -> Error e (* return error as is if Error *) +``` + +Consequently, using the example above, `g` will only be called if `f` returns an +`Ok` value. And similarly h will only be called if both `f` and `g` have +returned `Ok` values. + +The Error Monad has all the functions and operators you need to make full use of +this monad. + +``` +val ok : 'a -> ('a, 'error trace) result +``` + + + +Best practices +-------------- + +Intended use vs historical bagage + +What to do? + +What exceptions are acceptable? + + + +Joint lwt-result monad +---------------------- + +one monad, one other monad, one more monad + + + +Lwtreslib +--------- + +Basics + +Design principles + +Quick tour of the modules + + diff --git a/docs/developer/error_monad_primer.rst b/docs/developer/error_monad_primer.rst new file mode 100644 index 0000000000000000000000000000000000000000..7831aa5f47e3d05921094c550ad19f05ade7fafe --- /dev/null +++ b/docs/developer/error_monad_primer.rst @@ -0,0 +1,238 @@ +.. _error_monad_primer: + +Cheatsheet for the error monad +============================== + +The error monad is used in the whole of the Tezos codebase. Below are some +basic explanations and a cheatsheet of the available operators that you, a +developer of Tezos, are likely to use daily. + +This is not intended as an in-depth introduction to monads in general, the +error-monad, or error management. Basic understanding of these topics is +expected. + +Note that the ``Error_monad`` module is ``open``\ ed in most of the codebase. As +a result, in all the identifiers below, the leading ``Error_monad.`` can be +omitted. They are included here for technical precision. + +Result, Error, Trace, TzResult, Lwt, Result-Lwt, TzResult-Lwt +------------------------------------------------------------- + +The error monad is not a monad: it is a tiered system of multiple monad-like +abstractions. Specifically there is: + +* Vanilla code not technically part of the error monad: code without any result + or Lwt in it. + +* The *result monad* which wraps values in the ``result`` type: ``Ok`` for + successful values that are carried through binds and maps, ``Error`` for + failures that interrupt the flow of computation. + +* The ``Error_monad.error`` extensible type is for values that are meant to + represent failures in the software. New errors can be declared and it is your + responsibility to register them if you do so. It is intended to be included in + traces. + +* The ``'error Error_monad.trace`` type is for values that carry multiple + ``'error``. Traces can be used to represent structured information about + failures (e.g., there was a failure creating a file, which caused a failure to + save data, which caused a failure to cleanly teardown some process). + It is intended to be carried in the ``Error`` constructor of a ``result`` to + form a ``tzresult``. + +* The ``'a Error_monad.tzresult`` type is an alias for + ``('a, Error_monad.error Error_monad.trace) result``. It is meant to be used + like a normal result, but there are a few dedicated operators for this type + only. + +* The *Lwt monad* which wraps computations in promises that may or may not + eventually resolve to values. Promises resolve concurrently: some I/O + operations necessary to one promise may take time during which other promises + may make progress towards resolution. + +* The combined *result-Lwt monad* which wraps computations in promises that may + or may not eventually resolve to values in the ``result`` type. + +* The ``'a Error_monad.tzresult Lwt.t`` type is an alias for + ``('a, Error_monad.error Error_monad.trace) result Lwt.t``. It is meant to be + used like a normal combined result-Lwt promise, but there are a few dedicated + operators for this type only. + + +Cheatsheet +---------- + +Errors: + +* Extend the type with + ``type Error_monad.error += Crashed_and_burnt of {temperature: int}`` + +* **You must** register the error with:: + + let () = + Error_monad.register_error_kind + `Temporary (* or another category *) + (* id: must be short and unique (beware of registration within functors) *) + ~id:"crashed-and-burnt" + (* title: a few human-readable words *) + ~title:"Process crashed and machine burnt" + (* description: a human-readable sentence of two *) + ~description:"The process crashed unexpectedly and caused the machine \ + to burn down." + (* pretty-print: to human-readable form, use payload *) + ~pp:(fun ppf temperature -> + Format.fprintf + ppf + "Process crashed and machine burnt out at %d degrees" + temperature) + (* encoding: must be an obj, only include the payload *) + Data_encoding.(obj1 (req "temperature" int)) + (* payload extraction *) + (function + | Crashed_and_burnt {temperature} -> Some temperature + | _ -> None) + (* reconstruction from payload *) + (fun temperature -> Crashed_and_burnt {temperature}) + +Traces: + +* ``Error_monad.error`` and ``Error_monad.fail`` wrap the argument in a trace, no + need to construct manually.:: + + Error_monad.fail (File_cannot_be_created "file already exists") + +* Use ``Error_monad.record_trace`` and ``Error_monad.trace`` to enrich a trace + with a new error:: + + Error_monad.trace (Crashed_and_burnt {temperature = 10_000_000_000}) + @@ (load_parameters () >>=? fun p -> + prime_parameters p >>=? fun () -> + magic ()) + +* Do not match on the trace (only allowed for legacy support). + +The result monad: + +* The monad's ``return`` is ``Error_monad.ok`` or ``Result.ok``. + + * The ``Result.Ok`` constructor can be used directly as well. + +* The monad's ``bind`` is ``Error_monad.( >>? )`` or ``Result.bind``. + +* The monad's ``map`` is ``Error_monad.( >|? )`` or ``Result.map``. + +* Failure within the monad is done via ``Result.error`` + + * The ``Result.Error`` constructor can be used directly as well. + +* Mnemonic: ``?`` represents the uncertainty of possible failures + +The tzresult monad: + +* Like the result monad, but + +* Failure within the monad is done via ``Error_monad.error`` + +The Lwt monad: + +* The monad's ``return`` is ``Lwt.return`` + +* The monad's ``bind`` is ``Lwt.( >>= )`` or ``Lwt.bind``. + +* The monad's ``map`` is ``Lwt.( >|= )`` or ``Lwt.map``. + +* Mnemonic: ``=`` represents two promised values evaluated concurrently + +The combined result-Lwt monad: + +* The monad's ``return`` is ``Error_monad.return`` or ``Lwt.return_ok``. + +* The monad's ``bind`` is ``Error_monad.( >>=? )``. + +* The monad's ``map`` is ``Error_monad.( >|=? )``. + +* Failure within the monad is done via ``Lwt.return_error``. + + * ``Lwt.return (Result.Error _)`` can be used directly as well. + +The combined tzresult-Lwt monad: + +* The monad's ``return`` is ``Error_monad.return`` or ``Lwt.return_ok``. + +* The monad's ``bind`` is ``Error_monad.( >>=? )``. + +* The monad's ``map`` is ``Error_monad.( >|=? )``. + +* Failure within the monad is done via ``Error_monad.fail``. + +Lwtreslib: + +* ``List``, ``Map``, ``Set``, ``Seq``, etc. are shadowed + +* The shadows have traversors for the monads + +* ``iter_e`` is like ``iter`` but in the result or tzresult monads + +* ``iter_s`` is like ``iter`` but in the Lwt monad + +* ``iter_es`` is like ``iter`` but in the combined result-Lwt or tzresult-Lwt monad + +* ``iter_p`` and ``iter_ep`` for concurrent iteration + +* Mnemonic: ``e`` for error, ``s`` for sequential, ``p`` for parallel + +Switching monads: + +In the following examples, the functions ``f`` returns non-Lwt, non-result +values, ``f_s`` returns Lwt promises, ``f_e`` returns results, and ``f_es`` +returns a promised result. + +* vanilla+Lwt=Lwt: ``let x = f () in f_s x`` + +* vanilla+result=result: ``let x = f () in f_e x`` + +* vanilla+combined=combined: ``let x = f () in f_es x`` + +* Lwt+vanilla=Lwt: ``f_s () >>= fun x -> Lwt.return (f x)`` + +* Lwt+result=combined: ``f_s () >>= fun x -> Lwt.return (f_e x)`` + +* Lwt+combined=combined ``f_s () >>= fun x -> f_es x`` + +* result+vanilla=result: ``f_e () >>? fun x -> Ok (f x)`` + +* result+Lwt=combined: (special operator) ``f_e () >|?= fun x -> f_s x`` + +* result+combined=combined: (special operator) ``f_e () >>?= fun x -> f_es x`` + +* combined+vanilla=combined: ``f_es () >>=? fun x -> Error_monad.return (f x)`` + +* combined+Lwt=combined: ``f_es () >>=? fun x -> (f_s x >>= fun y -> Error_monad.return y)`` + or ``f_es () >>=? fun x -> (f_s x >|= Error_monad.ok)`` + or ``f_es () >>=? fun x -> Lwt.map ok (f_s x)`` + +* combined+result=combined: ``f_es () >>=? fun x -> Lwt.return (f_e x)`` + +Getting out of result: + +It is possible to recover from failures of the (tz)result and combined monads. + +* result-to-vanilla: ``match f_e () with Ok x -> … | Error e -> …`` + +* combined-to-Lwt: ``f_es () >>= function Ok x -> … | Error e -> …`` + +* Avoid matching on the specific trace + +Exceptions: + +* Avoid exceptions + +* If you must use exceptions, read-up on Lwt and exceptions + +Special case: the protocol +-------------------------- + +Each protocol has its own ``error`` and ``trace`` type entirely separate from +the shell's. You can convert protocol errors (traces, tzresults) into shell +errors (traces, tzresults) using ``wrap_tzerror`` (``wrap_tztrace``, +``wrap_tzresult``).