[go: up one dir, main page]

Keep alive flag is ignored when starting daemons until connection has been established

Environment (Mainnet, test network, build from source, ...)

Mainnet, testnet, pre-compiled binaries, built from source, etc.

Summary

When launching the baker/endorser/accuser daemons, you can optionally supply a -K or --keep-alive flag to keep the process alive and continue retrying to reconnect to the node when it finds itself unable to connect. This flag works properly when a connection has previously been established but becomes lost. However, this flag is completely ignored when launching the daemon, before it has established a connection for the first time.

Expected behavior

I expect when launching the daemon with the keep alive flag that it will keep the process alive and re-attempt to establish a connection to the node in perpetuity.

This should allow me to launch the node and baker/endorser daemons simultaneously and not be concerned about potential delays to the node completing its start-up procedures, such as due to restoring the store's consistency, which has become a more frequent problem as of late (see !3515 (merged)).

Actual behavior

If the daemon is not able to connect to the node upon launch, it will retry a few times (whether or not the --keep-alive flag is passed) and then give up if no connection is established.

If the daemon is already connected to the node and becomes disconnected, it will properly be kept alive and repeatedly attempt to reconnect.

Steps to reproduce

Launch baker daemon with -K flag while node is not running. It will retry a few times then give up. Similar results with other daemons.

History mode

Rolling, full, archive

Logs

./tezos-baker-010-PtGRANADA run with local node $HOME/.tezos-node baker -K
Connection refused, retrying in 1.00 seconds...
Waiting for the node to be bootstrapped...
Connection refused, retrying in 1.50 seconds...
Connection refused, retrying in 2.25 seconds...
Connection refused, retrying in 3.38 seconds...
Connection refused, retrying in 5.06 seconds...
Error:
  Rpc request failed:
     - meth: GET
     - uri: http://localhost:8732/monitor/bootstrapped
     - error: Unable to connect to the node: "Unix.Unix_error(Unix.ECONNREFUSED, "connect", "")"