Open
Milestone
started on Nov 14, 2025
4BT: Improve baker resilience
Overview
This milestone aims to significantly increase the robustness, reliability, and fault-tolerance of the Octez baker. Several classes of issues have been identified that can lead to silent failures, blocking operations, or degraded performance. The focus of this milestone is to ensure that the baker reacts predictably to errors, avoids hanging on slow or unresponsive RPCs, and adopts a more modular, worker-based internal architecture.
Tasks
- Handle ignored errors properly
-
Audit the codebase for ignored or swallowed exceptions. -
Replace them with explicit error handling paths. -
Add structured logging and metrics to make error origins observable.
- Add Timeouts to All Unbounded RPC Calls
-
Identify all RPC calls lacking timeouts. -
Introduce robust timeout wrappers with clear behaviors on expiration (retry, fallback, fail fast). -
Add configuration parameters for RPC timeouts where relevant.
- Rationalize and Harden RPC Usage
-
Centralize RPC handling logic when possible.
- Tezos workerize
-
Identify components that would benefit from tezos workerization -
Convert them into supervised workers with failure isolation.
Loading
Loading
Loading
Loading