[go: up one dir, main page]

Proof of concept using git repack --filter to offload packfiles

Now that git repack --filter patch series has been merged to master, let's spin up a proof of concept where Gitaly housekeeping knows how to call git repack with --filter. We can add a new strategy option to the OptimizeRepositoryRequest housekeeping RPC in Gitaly, so that Rails can pass in a parameter for sending certain blobs to a separate packfile.

To make things simple, maybe Gitaly can just have a special hard-coded directory under which it will put the packfiles for each repository. The alternate mechanism will be used so that Git and Gitaly can access these separate packfiles.

Design

Triggering a filtering repack

In proto/repository.proto, OptimizeRepositoryRequest which is a request for the OptimizeRepository RPC contains a Strategy enum:

  // Strategy determines how the repository shall be optimized.
  enum Strategy {
    // STRATEGY_UNSPECIFIED indicates that the strategy has not been explicitly set by the
    // caller. The default will be STRATEGY_HEURISTICAL in that case.
    STRATEGY_UNSPECIFIED = 0;
    // STRATEGY_HEURISTICAL performs heuristical optimizations in the repository. The server will
    // decide on a set of heuristics which parts need optimization and which ones don't to avoid
    // performing unnecessary optimization tasks.
    STRATEGY_HEURISTICAL = 1;
    // STRATEGY_EAGER performs eager optimizations in the repository. The server will optimize all
    // data structures regardless of whether they are well-optimized already.
    STRATEGY_EAGER = 2;
  }

I think we could add a different strategy here, to say that we want blobs to be moved to a different storage, maybe:

    // STRATEGY_MOVE_BLOBS performs eager optimizations in the repository, like STRATEGY_EAGER and
    // also moves the blobs away onto separate storage.
    STRATEGY_MOVE_BLOBS = 3;

Perhaps in the future we will want to have different strategies, but I think it's enough for now.

Other code changes

In internal/git/housekeeping/objects.go, we will need also another RepackObjectsStrategy and associated code that will call performRepack() with --filter=blob:none and --filter-to=....

Code will also be needed to properly setup the alternate mechanism for the repo so that it can still access the blobs that have been moved to the special hard-coded directory.

If the repo is later repacked using a different strategy, maybe we can also have code to remove the alternate mechanism setup for this repo.

Edited by Christian Couder