From 1ad9ce2e7bf8bbf42cc6266e24854bfc7495f5c6 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 8 Dec 2025 16:03:57 +0100 Subject: [PATCH 1/4] builtin/repack: avoid rewriting up-to-date MIDX Hi, this small patch series introduces logic to avoid rewriting the multi-pack index in case it's up-to-date already. This is especially relevant in the context of geometric repacking, where we may decide to not write any new packfiles, but we'd still rewrite the multi-pack index. This is a follow-up for the discussion that happened at [1]. Changes in v2: - Move the logic to skip writing updates into `write_midx_internal()`. We already had some logic there to skip no-op rewrites, so we only extend that logic now to handle the "--stdin-packs" option. This also has the added benefit that we know to strip bitmaps in case the write is a no-op. - I don't handle the case anymore where the preferred pack is changing. We didn't do so in the preexisting checks either, so I decided to drop this for now. This _can_ be considered as a bug, and if anyone thinks it is then I'll extend these checks. - Adapt the tests to use git-multi-pack-index(1) directly. - Link to v1: https://lore.kernel.org/r/20251208-pks-skip-noop-rewrite-v1-0-430d52dba9f0@pks.im Thanks! Patrick [1]: <20251025191550.GA279793@coredump.intra.peff.net> To: git@vger.kernel.org Cc: Taylor Blau Cc: Jeff King --- b4-submit-tracking --- # This section is used internally by b4 prep for tracking purposes. { "series": { "revision": 2, "change-id": "20251208-pks-skip-noop-rewrite-38d7f01c79c5", "prefixes": [], "history": { "v1": [ "20251208-pks-skip-noop-rewrite-v1-0-430d52dba9f0@pks.im" ] } } } -- GitLab From 88ba93d3a53636a160097e4acc76794ddec2e506 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 8 Dec 2025 10:48:38 +0100 Subject: [PATCH 2/4] midx: fix `BUG()` when getting preferred pack without a reverse index The function `midx_preferred_pack()` returns the preferred pack for a given multi-pack index. To compute the preferred pack we: 1. Take the first position indexed by the MIDX in pseudo-pack order. 2. Convert this pseudo-pack position into the MIDX position. 3. We then look up the pack that corresponds to this MIDX position. This reliably returns the preferred pack given that all of its contained objects will be up front in pseudo-pack order. The second step that turns the pseudo-pack order into MIDX order requires the reverse index though, which may not exist for example when the MIDX does not have a bitmap. And in that case one may easily hit a bug: BUG: ../pack-revindex.c:491: pack_pos_to_midx: reverse index not yet loaded In theory, `midx_preferred_pack()` already knows to handle the case where no reverse index exists, as it calls `load_midx_revindex()` before calling into `midx_preferred_pack()`. But we only check for negative return values there, even though the function returns a positive error code in case the reverse index does not exist. Fix the issue by testing for a non-zero return value instead, same as all the other callers of this function already do. While at it, document the return value of `load_midx_revindex()`. Signed-off-by: Patrick Steinhardt --- midx.c | 2 +- pack-revindex.h | 3 ++- t/t5319-multi-pack-index.sh | 13 +++++++++++++ 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/midx.c b/midx.c index 24e1e72175..b681b18fc1 100644 --- a/midx.c +++ b/midx.c @@ -686,7 +686,7 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id) { if (m->preferred_pack_idx == -1) { uint32_t midx_pos; - if (load_midx_revindex(m) < 0) { + if (load_midx_revindex(m)) { m->preferred_pack_idx = -2; return -1; } diff --git a/pack-revindex.h b/pack-revindex.h index 422c2487ae..0042892091 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -72,7 +72,8 @@ int verify_pack_revindex(struct packed_git *p); * multi-pack index by mmap-ing it and assigning pointers in the * multi_pack_index to point at it. * - * A negative number is returned on error. + * A negative number is returned on error. A positive number is returned in + * case the multi-pack-index does not have a reverse index. */ int load_midx_revindex(struct multi_pack_index *m); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 93f319a4b2..9492a9737b 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -350,7 +350,20 @@ test_expect_success 'preferred pack from existing MIDX without bitmaps' ' # the new MIDX git multi-pack-index write --preferred-pack=pack-$pack.pack ) +' +test_expect_success 'preferred pack cannot be determined without bitmap' ' + test_when_finished "rm -fr preferred-can-be-queried" && + git init preferred-can-be-queried && + ( + cd preferred-can-be-queried && + test_commit initial && + git repack -Adl --write-midx --no-write-bitmap-index && + test_must_fail test-tool read-midx --preferred-pack .git/objects 2>err && + test_grep "could not determine MIDX preferred pack" err && + git repack -Adl --write-midx --write-bitmap-index && + test-tool read-midx --preferred-pack .git/objects + ) ' test_expect_success 'verify multi-pack-index success' ' -- GitLab From 9bd320b2cf65efce5d9028631b7e7bb878263f6f Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Wed, 10 Dec 2025 08:54:58 +0100 Subject: [PATCH 3/4] midx-write: extract function to test whether MIDX needs updating In `write_midx_internal()` we know to skip writing the new multi-pack index in case it would be the same as the existing one. This logic does not handle the `--stdin-packs` option yet though, so we end up always rewriting the MIDX if that option is passed to us. Extract the logic to decide whether or not to rewrite the MIDX into a separate function. This will allow us to extend that feature in the next commit to address the above issue. Signed-off-by: Patrick Steinhardt --- midx-write.c | 39 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/midx-write.c b/midx-write.c index e3e9be6d03..78bc8a65b8 100644 --- a/midx-write.c +++ b/midx-write.c @@ -1014,6 +1014,41 @@ static void clear_midx_files(struct odb_source *source, strbuf_release(&buf); } +static bool midx_needs_update(struct write_midx_context *ctx) +{ + struct multi_pack_index *midx = ctx->m; + bool needed = true; + + /* + * Ignore incremental updates for now. The assumption is that any + * incremental update would be either empty (in which case we will bail + * out later) or it would actually cover at least one new pack. + */ + if (ctx->incremental) + goto out; + + /* + * If there is no MIDX then either it doesn't exist, or we're doing a + * geometric repack. We cannot (yet) determine whether we need to + * update the multi-pack index in the second case. + */ + if (!midx) + goto out; + + /* + * Otherwise, we need to verify that the packs covered by the existing + * MIDX match the packs that we already have. This test is somewhat + * lenient and will be fixed. + */ + if (ctx->nr != midx->num_packs + midx->num_packs_in_base) + goto out; + + needed = false; + +out: + return needed; +} + static int write_midx_internal(struct odb_source *source, struct string_list *packs_to_include, struct string_list *packs_to_drop, @@ -1111,9 +1146,7 @@ static int write_midx_internal(struct odb_source *source, for_each_file_in_pack_dir(source->path, add_pack_to_midx, &ctx); stop_progress(&ctx.progress); - if ((ctx.m && ctx.nr == ctx.m->num_packs + ctx.m->num_packs_in_base) && - !ctx.incremental && - !(packs_to_include || packs_to_drop)) { + if (!packs_to_include && !packs_to_drop && !midx_needs_update(&ctx)) { struct bitmap_index *bitmap_git; int bitmap_exists; int want_bitmap = flags & MIDX_WRITE_BITMAP; -- GitLab From f594865c127cd406ed4759c7bc949bc5f3c20f33 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 8 Dec 2025 09:41:29 +0100 Subject: [PATCH 4/4] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed In `write_midx_internal()` we know to skip rewriting the multi-pack index in case the existing one already covers all packs. This logic does not know to handle `git multi-pack-index write --stdin-packs` though, so we end up always rewriting the MIDX in this case even if the MIDX would not change. With our default maintenance strategy this isn't really much of a problem, as git-gc(1) does not use the "--stdin-packs" option. But that is changing with geometric repacking, where "--stdin-packs" is used to explicitly select the packfiles part of the geometric sequence. This issue can be demonstrated trivially with a benchmark in the Git repository: executing `git repack --geometric=2 --write-midx -d` in the Git repository takes more than 3 seconds only to end up with the same multi-pack index as we already had before. The logic that decides if we need to rewrite the MIDX only checks whether the number of packfiles covered will change. That check is of course too lenient for "--stdin-packs", as it could happen that we want to cover a different-but-same-size set of packfiles. But there is no inherent reason why we cannot handle "--stdin-packs". Improve the logic to not only check for the number of packs, but to also verify that we are asked to generate a MIDX for the _same_ packs. This allows us to also skip no-op rewrites for "--stdin-packs". Helped-by: Taylor Blau Signed-off-by: Patrick Steinhardt --- midx-write.c | 100 +++++++++++++++++++++++++----------- t/t5319-multi-pack-index.sh | 51 ++++++++++++++++++ t/t7703-repack-geometric.sh | 35 +++++++++++++ 3 files changed, 156 insertions(+), 30 deletions(-) diff --git a/midx-write.c b/midx-write.c index 78bc8a65b8..ce459b02c3 100644 --- a/midx-write.c +++ b/midx-write.c @@ -1014,9 +1014,10 @@ static void clear_midx_files(struct odb_source *source, strbuf_release(&buf); } -static bool midx_needs_update(struct write_midx_context *ctx) +static bool midx_needs_update(struct multi_pack_index *midx, struct write_midx_context *ctx) { - struct multi_pack_index *midx = ctx->m; + struct strset packs = STRSET_INIT; + struct strbuf buf = STRBUF_INIT; bool needed = true; /* @@ -1027,25 +1028,48 @@ static bool midx_needs_update(struct write_midx_context *ctx) if (ctx->incremental) goto out; - /* - * If there is no MIDX then either it doesn't exist, or we're doing a - * geometric repack. We cannot (yet) determine whether we need to - * update the multi-pack index in the second case. - */ - if (!midx) - goto out; - /* * Otherwise, we need to verify that the packs covered by the existing - * MIDX match the packs that we already have. This test is somewhat - * lenient and will be fixed. + * MIDX match the packs that we already have. The logic to do so is way + * more complicated than it has any right to be. This is because: + * + * - We cannot assume any ordering. + * + * - The MIDX packs may not be loaded at all, and loading them would + * be wasteful. So we need to use the pack names tracked by the + * MIDX itself. + * + * - The MIDX pack names are tracking the ".idx" files, whereas the + * packs themselves are tracking the ".pack" files. So we need to + * strip suffixes. */ if (ctx->nr != midx->num_packs + midx->num_packs_in_base) goto out; + for (uint32_t i = 0; i < ctx->nr; i++) { + strbuf_reset(&buf); + strbuf_addstr(&buf, pack_basename(ctx->info[i].p)); + strbuf_strip_suffix(&buf, ".pack"); + + if (!strset_add(&packs, buf.buf)) + BUG("same pack added twice?"); + } + + for (uint32_t i = 0; i < ctx->nr; i++) { + strbuf_reset(&buf); + strbuf_addstr(&buf, midx->pack_names[i]); + strbuf_strip_suffix(&buf, ".idx"); + + if (!strset_contains(&packs, buf.buf)) + goto out; + strset_remove(&packs, buf.buf); + } + needed = false; out: + strbuf_release(&buf); + strset_clear(&packs); return needed; } @@ -1066,6 +1090,7 @@ static int write_midx_internal(struct odb_source *source, struct write_midx_context ctx = { .preferred_pack_idx = NO_PREFERRED_PACK, }; + struct multi_pack_index *midx_to_free = NULL; int bitmapped_packs_concat_len = 0; int pack_name_concat_len = 0; int dropped_packs = 0; @@ -1146,25 +1171,39 @@ static int write_midx_internal(struct odb_source *source, for_each_file_in_pack_dir(source->path, add_pack_to_midx, &ctx); stop_progress(&ctx.progress); - if (!packs_to_include && !packs_to_drop && !midx_needs_update(&ctx)) { - struct bitmap_index *bitmap_git; - int bitmap_exists; - int want_bitmap = flags & MIDX_WRITE_BITMAP; - - bitmap_git = prepare_midx_bitmap_git(ctx.m); - bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git); - free_bitmap_index(bitmap_git); - - if (bitmap_exists || !want_bitmap) { - /* - * The correct MIDX already exists, and so does a - * corresponding bitmap (or one wasn't requested). - */ - if (!want_bitmap) - clear_midx_files_ext(source, "bitmap", NULL); - result = 0; - goto cleanup; + if (!packs_to_drop) { + /* + * If there is no MIDX then either it doesn't exist, or we're + * doing a geometric repack. Try to load it from the source to + * tell these two cases apart. + */ + struct multi_pack_index *midx = ctx.m; + if (!midx) + midx = midx_to_free = load_multi_pack_index(ctx.source); + + if (midx && !midx_needs_update(midx, &ctx)) { + struct bitmap_index *bitmap_git; + int bitmap_exists; + int want_bitmap = flags & MIDX_WRITE_BITMAP; + + bitmap_git = prepare_midx_bitmap_git(midx); + bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git); + free_bitmap_index(bitmap_git); + + if (bitmap_exists || !want_bitmap) { + /* + * The correct MIDX already exists, and so does a + * corresponding bitmap (or one wasn't requested). + */ + if (!want_bitmap) + clear_midx_files_ext(source, "bitmap", NULL); + result = 0; + goto cleanup; + } } + + close_midx(midx_to_free); + midx_to_free = NULL; } if (ctx.incremental && !ctx.nr) { @@ -1520,6 +1559,7 @@ static int write_midx_internal(struct odb_source *source, free(keep_hashes); } strbuf_release(&midx_name); + close_midx(midx_to_free); trace2_region_leave("midx", "write_midx_internal", r); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 9492a9737b..794f8b5ab4 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -366,6 +366,57 @@ test_expect_success 'preferred pack cannot be determined without bitmap' ' ) ' +test_midx_is_retained () { + test-tool chmtime =0 .git/objects/pack/multi-pack-index && + ls -l .git/objects/pack/multi-pack-index >expect && + git multi-pack-index write "$@" && + ls -l .git/objects/pack/multi-pack-index >actual && + test_cmp expect actual +} + +test_midx_is_rewritten () { + test-tool chmtime =0 .git/objects/pack/multi-pack-index && + ls -l .git/objects/pack/multi-pack-index >expect && + git multi-pack-index write "$@" && + ls -l .git/objects/pack/multi-pack-index >actual && + ! test_cmp expect actual +} + +test_expect_success 'up-to-date multi-pack-index is retained' ' + test_when_finished "rm -fr midx-up-to-date" && + git init midx-up-to-date && + ( + cd midx-up-to-date && + + # Write the initial pack that contains the most objects. + test_commit first && + test_commit second && + git repack -Ad --write-midx && + test_midx_is_retained && + + # Writing a new bitmap index should cause us to regenerate the MIDX. + test_midx_is_rewritten --bitmap && + test_midx_is_retained --bitmap && + + # Ensure that writing a new packfile causes us to rewrite the index. + test_commit incremental && + git repack -d && + test_midx_is_rewritten && + test_midx_is_retained && + + for pack in .git/objects/pack/*.idx + do + basename "$pack" || exit 1 + done >stdin && + test_line_count = 2 stdin && + test_midx_is_retained --stdin-packs stdin.trimmed && + test_midx_is_rewritten --stdin-packs expect && + git repack --geometric=2 --write-midx --no-write-bitmap-index && + ls -l .git/objects/pack/ >actual && + test_cmp expect actual + ) +' + +test_expect_success '--geometric --write-midx retains up-to-date MIDX with bitmap index' ' + test_when_finished "rm -fr repo" && + git init repo && + test_commit -C repo initial && + + test_path_is_missing repo/.git/objects/pack/multi-pack-index && + git -C repo repack --geometric=2 --write-midx --write-bitmap-index && + test_path_is_file repo/.git/objects/pack/multi-pack-index && + test-tool chmtime =0 repo/.git/objects/pack/multi-pack-index && + + ls -l repo/.git/objects/pack/ >expect && + git -C repo repack --geometric=2 --write-midx --write-bitmap-index && + ls -l repo/.git/objects/pack/ >actual && + test_cmp expect actual +' + test_expect_success '--geometric --write-midx with packfiles in main and alternate ODB' ' test_when_finished "rm -fr shared member" && -- GitLab