Improve and consolidate RPC UX
Improve and consolidate RPC UX
This is a sub-milestone of %(2024Q2) - Layer 1 - Public RPC endpoint supporting average 1k RPS
This milestone aims to improve the RPC UX thanks to the feedback of users, service providers, wallets, indexers or anyone from the community. The idea is to identify the issues raised by the community, resolve straightforward requests and specify solutions for future improvements. This is a preliminary step toward the following milestone (<INSERT_MILESTONE_ID> Design a wide RPC engine rehaul).
As this milestone will generate a workload depending on community feedbacks and code inspection, it contains long running steps that aims to be tackled, as background tasks, during the whole project. The potentially remaining tasks will be considered as backlog.
Work breakdown
Community feedback
-
(ETA: 2024-04-13) Inventory of the frictions encountered by users (indexers/wallets/bakers/community) @ryan.tan3 -
Collect feedback from the community - [WIP] RPC user feedback -
Analyse former requests Nicolas Ochem which list -
Formulate potential RPC improvements -
Ask feedback from wallets (Simon McLoughlin, Jev Björsell) -
Ask feedback from indexers (Max Strebkov, Alexander Eichhorn) -
Propose/organize discussion during #proto-ecosystem -
#proto-ecosystem call analysis
-
-
Performance evaluation (aka RPS/OoST)
-
(ETA: 2024-06-07) Define, estimate and consolidate the Requests Per Second indicator @julien.t @gabriel.moise -
Define performance evaluation criteria : Defining performance evaluation of RPC engine -
Implement tools to compute criteria -
Out of Synchronisation time !13558 (merged) -
Requests to benchmark (1 by 1 or scenarios) -
Status to benchmark (Idle/bootstrapping) -
RPS benchmark results
-
-
Requests per second !13557 (merged) -
Requests to benchmark (1 by 1 or scenarios) -
Status to benchmark (Idle/bootstrapping) -
RPS benchmark results
-
-
-
Propose a reference configuration -
Use the recommanded Octez hardware: GCP n2-standard-4
-
-
Propose baseline values for OST/RPS -
Compare baseline vs. end of project OST/RPS -
Merge the tool in tezos/contribfor future use -
Publish a documentation (online doc) to reproduce the performance evaluation
-
File descriptor leak
-
(ETA: 2024-06-07) Understand Rollup nodes file descriptor leaks @diana.savvatina @gabriel.moise , #7042 (closed) -
Discuss with rollup operators (Chris P.) and developers (Alain M.) (see) -
Compare to former know issue from Non blocking RPC project -
Understand the issue and sketch a solution (document). - Found issues
-
EOF is not detected -
fixed
-
-
RPC middleware is not notified by client exit -
fixed
-
-
Cohttp is not closing connections -
fixed
-
-
- Github proposal for Cohttp: https://github.com/gabrielmoise17/ocaml-cohttp/pull/1
- MRs for RPC leak: https://gitlab.com/tezos/tezos/-/merge_requests?scope=all&state=opened&label_name\[\]=rpc_leak
- Found issues
-
Release the fixes -
Pre-vendor dependencies @diana.savvatina @gabriel.moise -
Vendor resto (How-to) -
Vendor prometheus(req forcohttp) -
Vendor cohttp -
Backport the fixes to cohttpin monorepo
-
-
release it on cohttp repo (might be a long running task depending on the cohttp team) @gabriel.moise -
pre-PR for local review: https://github.com/gabrielmoise17/ocaml-cohttp/pull/1 -
mirage/cohttp PRs: -
Get PRs merged -- waiting for cohttp maintainers feedback
-
-
-
Documentation @diana.savvatina -
Improve online doc with materials from exploration document -
Improve RPC stack (cohttp/resto) documentation
-
-
User/community requested features
-
(ETA: 2024-06-28) Conform to the RPC API standards that are considered as low hanging fruits -
Identify low hanging fruits thanks to the feedbacks/survey (summary) -
Introduce HTTP header caching semantics (2024-06-07) @ryan.tan3 -
implement the feature -
Add config flag to enable/disable caching semantics
-
-
Finer monitoring of RPCs @julien.t @gabriel.moise -
Create an issue with the "nice to have" metrics/monitoring #7296 (closed) -
Implement the features from the above issue -
output size !13733 (closed) -
response codes -
latency -- check metric validity -
fix RPC metric missing aggregation !13757 (merged)
-
-
Reflects improvements in thedefault dashboardsgrafana
-
-
Improve synchronization transparency (2024-06-21) @ryan.tan3 -
Introduce if-none-mach in headers (2024-06-21) @ryan.tan3 -
Enforce deprecation policy by cleaning deprecated RPCs @diana.savvatina -
Write a small and basic paragraph on the deprecation policy !13723 (merged) -
Advertise the deprecation in the #proto-ecosystem call -
Advertise it in the doc-meeting to get feedback -
Advertise it in the MT meeting to get feedback -
Deprecate RPCs that are deprecated since too much time
-
-
Update and improve RPC documentation -
Improve Errors of typical use-cases
-
-
-
(ETA: 2024-06-28) Investigate the RPC API standards that cannot be met because of our current RPC implementation -
Identify blockers thanks to the feedbacks/survey -
Specify required improvements -
RPC engine rehaul proposal – Roadmap
-