[go: up one dir, main page]

Raft: Allow cluster to serve requests

When the log replication feature is finished, each node of the cluster should be able to serve client requests.

For read operations, all nodes behave the same. Their roles are interchangeable. However, the replica should be mindful of the replica lag. Most of the time, it's acceptable to serve a request with slightly stale data. Git doesn't have any guarantee about data consistency and access order. However, to make things easier at the first stage, replicas could use linearizable read (via SyncRead API - https://gitlab.com/gitlab-org/gitaly/-/issues/6030#note_1923676436). This ensures all reads are synchronized with the leader.

For write operations, replicas should reject all writes. In a network partition, there could be multiple self-claimed primaries. We should be extremely cautious.

Both read and write should also be aware of quiescing state. The Raft group is re-activated after any operation. Linearizable reads issue ReadIndex requests. Replicas will wake up from the quiescing state when they receive a response from the leader.

As in the scope of &13562, we could implement a dead-simple client-side routing in clients.

  • The client fetches the list of storage and its replica via Cluster.GetInfo RPC. This list contains a map of {storage_a: [storage_b, storage_c]}.
  • Before performing a request, the client picks a storage in a round-robin fashion. The repository's storage is overridden (Rails code for example).
  • The server rejects the request if it's not eligible to serve a request (mostly replicas serving writes). The returned error includes a detailed message pointing to the correct node.
  • The list of storage replicas is highly cacheable (in the order of hours).
  • The leader is cache-able in the same session (web requests, background jobs, etc.). It means we expect rejection is at most once in a session.
Edited by Quang-Minh Nguyen