[go: up one dir, main page]

RPC: FIX Cohttp EOF read with Lwt_unix recv MSG_PEEK

What

Enhanced connection handling for detection of client EOF. This prevents a hanging connection if the client callback never resolves.

  • Introduced wait_eof_or_closed() to wait for EOF or a Closed status on the input channel.
  • Added sleep_fn() as an optional callback for periodic EOF checks.

Why

If the callback provided on the server start never resolves, Cohttp will not detect the EOF in the input channel when the client died or closed the connection. As a result, the close_conn callback is not called and the resources allocated in the server to handle the callback are never released.

How

A new sleep_fn callback is introduced to get the EOF. If provided, the sleep_fn will be used for periodic checks for EOF from the client. If this callback is not provided, Cohttp will not detect and notify the client about EOF received from the peer while the client is handling the new connection. This can lead to a resource leak if the callback is designed to never resolve.

Manually testing the MR

In Tezos, described in https://docs.google.com/document/d/1dqOec9Fm8D9AuZOg6-7UO4-7P26vpHxXiFFUJsoOEDg/edit#heading=h.rkbp5frwoc0y

In terminal 1:

TEZOS_LOG="*->debug" CONDUIT_TLS=native CONDUIT_DEBUG=true COHTTP_DEBUG=true DATA_DIR=/tmp/lapin ./src/bin_node/octez-sandboxed-node.sh 1 --connections 0  2>&1 | grep -v quota

If you are running it for the first time, stop it, edit file /tmp/lapin/config.json, then start again:

-  "listen-addrs": ["127.0.0.1:18731"]
+  "external-listen-addrs": ["127.0.0.1:18731"]

If you run it multiple times and something weird happens, change the folder /tmp/lalpin to something else. E.g. /tmp/lapin1. When the external RPC process is started, it prints a trace:

starting RPC server on ::ffff:127.0.0.1:18731 (acl = AllowAll)

In terminal 2:

eval `./src/bin_client/octez-init-sandboxed-client.sh 1`
octez-activate-alpha
curl 'localhost:18731/monitor/heads/main'

Curl will wait for the new blocks until it is stopped.

In terminal 3:

lsof -a -U -c main.exe ; lsof -a -i -c main.exe | grep -v -e CLOSE_WAIT -e CLOSED ; lsof -a -i -c curl

Again in terminal 2:

Ctrl+C  // to stop curl

If the test is passed, you will see traces in terminal 1 about EOF, conn_closed() callbacks called by Cohttp, Resto and RPC middleware. Now, if you run lsof again, you will see that the connections created for curl are released in Tezos node and External RPC process.

Another way to reproduce is to run a test with RPC toy.

Checklist

  • Document the interface of any function added or modified (see the coding guidelines)
  • Document any change to the user interface, including configuration parameters (see node configuration)
  • Provide automatic testing (see the testing guide).
  • For new features and bug fixes, add an item in the appropriate changelog (docs/protocols/alpha.rst for the protocol and the environment, CHANGES.rst at the root of the repository for everything else).
  • Select suitable reviewers using the Reviewers field below.
  • Select as Assignee the next person who should take action on that MR
Edited by Victor Allombert

Merge request reports

Loading