Raft: Move repository identification to Gitaly

In the original design and detailed implementation research (this comment), it's ideal that Gitaly cluster manages all aspects of repositories, including repository identity and repository creation. It requires a huge amount of effort to bridge the current state, where Rails controls repository identity, to the ideal state. Although letting Gitaly cluster control this aspect yields many positive outcomes in the long term, it doesn't really matter in the short term. So, it makes sense to make that iterative:

In the scope of &13562, Rails declares and controls repository identity via (storage, relative_path) tuple. Gitaly cluster accommodates and reserves the current structure.
When log replication is in place, we'll continue to migrate repository identification to Gitaly if the benefits outweigh the cost.

Challenges of migrating repository identification to Gitaly

One of the main reasons for deferring this identification is because it's deeply rooted in Rails. When a repository is created, Rails picks destination storage using a weighted random sampler. The repository's relative_path is tight to its container (Project, Wiki, Snippet, etc. more about it here). Depending on the type of containers, the relative_path is slightly different. In general, it's a deterministic hash (SHA2 at this point) of the container ID.

For example, a project with ID 4 always has the relative_path of @hashed/4e/c9/4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a.git.

After Rails concludes the destinations of that repository, it sends /gitaly.RepositoryService/CreateRepository request to Gitaly containing (storage, relative_path). Some other RPCs create repositories implicitly, including but not limited to:

/gitaly.ObjectPoolService/CreateObjectPool
/gitaly.RepositoryService/CreateFork
/gitaly.RepositoryService/CreateRepositoryFromURL
/gitaly.RepositoryService/CreateRepositoryFromBundle
/gitaly.RepositoryService/CreateRepositoryFromBundle
/gitaly.RepositoryService/CreateRepositoryFromSnapshot
/gitaly.RepositoryService/ReplicateRepository

This approach allows quick translation from the container to the corresponding repository. However, it's not so easy to do the reverse translation. Fortunately, Rails maintains reverse lookup tables ProjectRepository, ProjectWikiRepository, SnippetRepository, etc. They are used for troubleshooting and Geo only. The normal flow doesn't query those tables at all! It's because we don't expose the underlying hashed repository to customers. The only way for them to access underlying repositories is via aliases.

For example:

git clone git@gitlab.com:gitlab-org/gitlab.git -> Project gitlab-org/gitlab (id 1234) -> repository hashed storage path.
git clone git@gitlab.com:gitlab-org/gitaly/snippets/2463399.git -> Snippet 2463399 -> snippet hashed storage path.

Estimations

In summary, moving the repository identification system to the Gitaly cluster involves the following work:

Ensure complementary tables are reliable and able to act as SSOT.
Rework repository access flow to exclusively use complementary *Repository tables.
Finalize how to deal with repository storage weight. In theory, it should be controlled by the cluster because we would love to implement advanced data placement. However, data placement is another hard problem to solve. Initially, we might need a compromise so that the Gitaly cluster stores the weight and decides which storage gonna handle new repositories. This approach involves implementing /ClusterService/GetStorageWeights and /ClusterService/SetStorageWeights as well as updating UIs.
Re-implement the main CreateRepository RPC so that it takes the Raft cluster (weights, replication, etc.) into account. Gitaly cluster picks the storage and creates a repository respectively. It returns (storage, relative_path) back to Rails.
Re-implement a portion of RPCs creating repositories implicitly.
Update Rails to handle such changes.
Implement tools or modules to let the Gitaly cluster handle existing repositories with different storage path projections.

Those works haven't included the implicit mental effort when we detach the container's identity from the repository's identity and update documentation, training, incident guidelines, etc.

In general, the upsides of moving the repository identification system to the Gitaly cluster don't justify the cost of doing so.

Alternative solution

We could pick a central point:

Gitaly cluster controls storage (weights, data locality, etc.).
Rails controls the repository's identity (storage and relative_path).

When Rails needs to create a repository, it follows the following flow:

Call /ClusterService/SelectStorage RPC with corresponding requested storage attributes (hdd/sdd, affinity, etc. ). Gitaly cluster returns SelectStorageResponse that contains the consulted storage name.
Rails follows the existing flow, sets repository identity, and sends repository-creating requests to consulted storage.

This approach cuts down the cost significantly while being flexible enough for us to implement advanced data placement. There are some implications of the new approach:

There is no exposed unique repository ID system. Relative paths act as the single source of truth. If two storages have the same relative path, they are replicas of each other.
It also eliminates the routing table. Because clients store repository's storages, they can easily find out the address of the storage as well as replicas of that storage. This strengthens the design of the routing mechanism mentioned in #6040 (closed).

Edited Jun 04, 2024 by Quang-Minh Nguyen