Baker/operation_worker: improve RPC failure resilience
What
Fix a behavior where the operation worker shuts down without shutting down the baker.
Add a watchdog to detect inactivity in the monitor_operations RPC stream and refresh it on timeout.
Note for reviewers: The proto_patch script failed to backport most of the changes, so I had to apply many of them manually. Please review carefully.
Why
How
Manually testing the MR
Checklist
-
Document the interface of any function added or modified (see the coding guidelines) -
Document any change to the user interface, including configuration parameters (see node configuration) -
Provide automatic testing (see the testing guide). -
For new features and bug fixes, add an item in the appropriate changelog ( docs/protocols/alpha.rstfor the protocol and the environment,CHANGES.rstat the root of the repository for everything else). -
Select suitable reviewers using the Reviewersfield below. -
Select as Assigneethe next person who should take action on that MR
Edited by Adam