Baker: improve (or fix) keep alive behavior
What
Partially closes #7920 (closed)
The issue contains 2 problems:
1. Keep-alive is not supported when the baker checks the version of the node. However, this is not an issue in the agnostic baker, therefore I consider that we do not need to fix this in the protocol bakers, it'll resolve itself. (Afaik it works in the agnostic baker because the retry is correctly handled by a RPC that precedes the version's check.)
2. Keep-alive does not work when the node is bootstrapping, it exits as soon as the connection is lost.
Update: I'm afraid (1.) is not completely true. It's true when you start the agnostic baker but I'm afraid that doesn't work if the connection is lost when a new protocol daemon is spawned.
Why
Because we want this behavior to be correct?
How
The patch is fairly simple:
- Put
max_intin retry if keep_alive is provided. - Wrap the whole bootstrapped mechanism in the retry, and double check that the node is correctly bootstrapped. (The whole problem before this merge request is that if the connection is lost, the baker considers the node to be bootstrapped).
Manually testing the MR
I failed to write an automatic tezt, probably because the retry mechanism prints on stdout and doesn't emit events, so it's hard to tracks errors. But this can be tested really easily locally.
## Node version check.
Do not run a node, but run your baker with:
./octez-baker-PsRiotum run with local node ~/.octez-node-ghostnet --liquidity-baking-toggle-vote off --without-dal --adaptive-issuance-vote off --keep-alive
Jul 16 10:43:29.900 WARN │ The `octez-baker` binary is now available. We recommend using it instead of `octez-baker-<protocol>`, as it automatically handles
Jul 16 10:43:29.900 WARN │ protocol switches.
Error:
Rpc request failed:
- meth: GET
- uri: http://localhost:8732/version
- error: Unable to connect to the node: "Unix.Unix_error(Unix.ECONNREFUSED, "connect", "")"
Then test with the agnostic baker:
./octez-baker run with local node ~/.octez-node-ghostnet --liquidity-baking-toggle-vote off tmp --without-dal --adaptive-issuance-vote off --keep-alive
Jul 16 10:43:55.834 NOTICE │ starting baker daemon
Jul 16 10:43:55.835 ERROR │ Cannot connect to node. Retrying in 1.00 seconds...
Jul 16 10:43:56.836 ERROR │ Cannot connect to node. Retrying in 1.50 seconds...
Bootstrapped
Run your node in private mode, it won't be considered as bootstrapped:
./octez-node run --data-dir ~/.octez-node-ghostnet --rpc-addr 127.0.0.1:8732 --allow-all-rpc 127.0.0.1:8732 --private-mode --no-bootstrap-peers
Run a baker:
./octez-baker run with local node ~/.octez-node-ghostnet --liquidity-baking-toggle-vote off tmp --without-dal --adaptive-issuance-vote off --keep-alive
Jul 16 10:44:59.898 NOTICE │ starting baker daemon
Jul 16 10:44:59.911 NOTICE │ starting baker for protocol PsRiotumaAMo
Jul 16 10:45:00.045 NOTICE │ baker for protocol PsRiotumaAMo is now running
Jul 16 10:45:00.047 WARN │ Adaptive issuance is now enabled, voting is no longer necessary. Please remove the argument from the CLI.
Jul 16 10:45:00.047 NOTICE │ read liquidity baking toggle vote = off
Jul 16 10:45:00.047 WARN │ No DAL node endpoint has been provided.
Jul 16 10:45:00.047 WARN │ Not running a DAL node might result in losing a share of the participation rewards.
Jul 16 10:45:00.047 WARN │ For
Jul 16 10:45:00.047 WARN │ instructions on how to run a DAL node, please visit https://docs.tezos.com/tutorials/join-dal-baker.
Jul 16 10:45:00.068 NOTICE │ new block (BMHVzWytNhpabLB8MoC9WYt7ZRviyC5DMJhwdPNLR1KtN2VNU8x) on proposal period (remaining period duration 10530)
Waiting for the node to be bootstrapped...
Connection failed. Retrying in 1.00 seconds...
Connection failed. Retrying in 1.50 seconds...
Waiting for the node to be bootstrapped...
And you'll see the difference with or without the patch.
Checklist
-
Document the interface of any function added or modified (see the coding guidelines) -
Document any change to the user interface, including configuration parameters (see node configuration) -
Provide automatic testing (see the testing guide). -
For new features and bug fixes, add an item in the appropriate changelog ( docs/protocols/alpha.rstfor the protocol and the environment,CHANGES.rstat the root of the repository for everything else). -
Select suitable reviewers using the Reviewersfield below. -
Select as Assigneethe next person who should take action on that MR