From ade14bffd71b75974492cf7afc525b8feb8a70b2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=98ystein=20Walle?= Date: Tue, 10 Jun 2025 00:10:55 +0200 Subject: [PATCH 01/19] rebase: write script before initializing state If rebase.instructionFormat is invalid the repository is left in a strange state when the interactive rebase fails. `git status` outputs boths the same as it would in the normal case *and* something related to interactive rebase: $ git -c rebase.instructionFormat=blah rebase -i fatal: invalid --pretty format: blah $ git status On branch master Your branch is ahead of 'upstream/master' by 1 commit. (use "git push" to publish your local commits) git-rebase-todo is missing. No commands done. No commands remaining. You are currently editing a commit while rebasing branch 'master' on '8db3019401'. (use "git commit --amend" to amend the current commit) (use "git rebase --continue" once you are satisfied with your changes) By attempting to write the rebase script before initializing the state this potential scenario is avoided. Signed-off-by: Junio C Hamano --- builtin/rebase.c | 42 ++++++++++++++++++------------------ t/t3415-rebase-autosquash.sh | 10 +++++++++ 2 files changed, 31 insertions(+), 21 deletions(-) diff --git a/builtin/rebase.c b/builtin/rebase.c index d4715ed35d77ed..06da8e4f367098 100644 --- a/builtin/rebase.c +++ b/builtin/rebase.c @@ -292,15 +292,6 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) &revisions, &shortrevisions)) goto cleanup; - if (init_basic_state(&replay, - opts->head_name ? opts->head_name : "detached HEAD", - opts->onto, &opts->orig_head->object.oid)) - goto cleanup; - - if (!opts->upstream && opts->squash_onto) - write_file(path_squash_onto(), "%s\n", - oid_to_hex(opts->squash_onto)); - strvec_pushl(&make_script_args, "", revisions, NULL); if (opts->restrict_revision) strvec_pushf(&make_script_args, "^%s", @@ -309,21 +300,30 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) ret = sequencer_make_script(the_repository, &todo_list.buf, make_script_args.nr, make_script_args.v, flags); - - if (ret) + if (ret) { error(_("could not generate todo list")); - else { - discard_index(the_repository->index); - if (todo_list_parse_insn_buffer(the_repository, &replay, - todo_list.buf.buf, &todo_list)) - BUG("unusable todo list"); - - ret = complete_action(the_repository, &replay, flags, - shortrevisions, opts->onto_name, opts->onto, - &opts->orig_head->object.oid, &opts->exec, - opts->autosquash, opts->update_refs, &todo_list); + goto cleanup; } + if (init_basic_state(&replay, + opts->head_name ? opts->head_name : "detached HEAD", + opts->onto, &opts->orig_head->object.oid)) + goto cleanup; + + if (!opts->upstream && opts->squash_onto) + write_file(path_squash_onto(), "%s\n", + oid_to_hex(opts->squash_onto)); + + discard_index(the_repository->index); + if (todo_list_parse_insn_buffer(the_repository, &replay, + todo_list.buf.buf, &todo_list)) + BUG("unusable todo list"); + + ret = complete_action(the_repository, &replay, flags, + shortrevisions, opts->onto_name, opts->onto, + &opts->orig_head->object.oid, &opts->exec, + opts->autosquash, opts->update_refs, &todo_list); + cleanup: replay_opts_release(&replay); free(revisions); diff --git a/t/t3415-rebase-autosquash.sh b/t/t3415-rebase-autosquash.sh index fcc40d6fe1fd5b..d4c624b841effc 100755 --- a/t/t3415-rebase-autosquash.sh +++ b/t/t3415-rebase-autosquash.sh @@ -394,6 +394,16 @@ test_expect_success 'autosquash with empty custom instructionFormat' ' ) ' +test_expect_success 'autosquash with invalid custom instructionFormat' ' + git reset --hard base && + test_commit invalid-instructionFormat-test && + ( + test_must_fail git -c rebase.instructionFormat=blah \ + rebase --autosquash --force-rebase -i HEAD^ && + test_path_is_missing .git/rebase-merge + ) +' + set_backup_editor () { write_script backup-editor.sh <<-\EOF cp "$1" .git/backup-"$(basename "$1")" From 6bde5d43b769509141d7ebc0b2476bc2035bc673 Mon Sep 17 00:00:00 2001 From: Karthik Nayak Date: Tue, 15 Jul 2025 13:28:26 +0200 Subject: [PATCH 02/19] refs: expose `ref_iterator` via 'refs.h' The `ref_iterator` is an internal structure to the 'refs/' sub-directory, which allows iteration over refs. All reference iteration is built on top of these iterators. External clients of the 'refs' subsystem use the various 'refs_for_each...()' functions to iterate over refs. However since these are wrapper functions, each combination of functionality requires a new wrapper function. This is not feasible as the functions pile up with the increase in requirements. Expose the internal reference iterator, so advanced users can mix and match options as needed. Signed-off-by: Karthik Nayak Signed-off-by: Junio C Hamano --- refs.h | 147 +++++++++++++++++++++++++++++++++++++++++++ refs/refs-internal.h | 145 +----------------------------------------- 2 files changed, 149 insertions(+), 143 deletions(-) diff --git a/refs.h b/refs.h index 46a6008e07f262..7c21aaef3db414 100644 --- a/refs.h +++ b/refs.h @@ -1190,4 +1190,151 @@ int repo_migrate_ref_storage_format(struct repository *repo, unsigned int flags, struct strbuf *err); +/* + * Reference iterators + * + * A reference iterator encapsulates the state of an in-progress + * iteration over references. Create an instance of `struct + * ref_iterator` via one of the functions in this module. + * + * A freshly-created ref_iterator doesn't yet point at a reference. To + * advance the iterator, call ref_iterator_advance(). If successful, + * this sets the iterator's refname, oid, and flags fields to describe + * the next reference and returns ITER_OK. The data pointed at by + * refname and oid belong to the iterator; if you want to retain them + * after calling ref_iterator_advance() again or calling + * ref_iterator_free(), you must make a copy. When the iteration has + * been exhausted, ref_iterator_advance() releases any resources + * associated with the iteration, frees the ref_iterator object, and + * returns ITER_DONE. If you want to abort the iteration early, call + * ref_iterator_free(), which also frees the ref_iterator object and + * any associated resources. If there was an internal error advancing + * to the next entry, ref_iterator_advance() aborts the iteration, + * frees the ref_iterator, and returns ITER_ERROR. + * + * The reference currently being looked at can be peeled by calling + * ref_iterator_peel(). This function is often faster than peel_ref(), + * so it should be preferred when iterating over references. + * + * Putting it all together, a typical iteration looks like this: + * + * int ok; + * struct ref_iterator *iter = ...; + * + * while ((ok = ref_iterator_advance(iter)) == ITER_OK) { + * if (want_to_stop_iteration()) { + * ok = ITER_DONE; + * break; + * } + * + * // Access information about the current reference: + * if (!(iter->flags & REF_ISSYMREF)) + * printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid)); + * + * // If you need to peel the reference: + * ref_iterator_peel(iter, &oid); + * } + * + * if (ok != ITER_DONE) + * handle_error(); + * ref_iterator_free(iter); + */ +struct ref_iterator; + +/* + * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(), + * which feeds it). + */ +enum do_for_each_ref_flags { + /* + * Include broken references in a do_for_each_ref*() iteration, which + * would normally be omitted. This includes both refs that point to + * missing objects (a true repository corruption), ones with illegal + * names (which we prefer not to expose to callers), as well as + * dangling symbolic refs (i.e., those that point to a non-existent + * ref; this is not a corruption, but as they have no valid oid, we + * omit them from normal iteration results). + */ + DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0), + + /* + * Only include per-worktree refs in a do_for_each_ref*() iteration. + * Normally this will be used with a files ref_store, since that's + * where all reference backends will presumably store their + * per-worktree refs. + */ + DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1), + + /* + * Omit dangling symrefs from output; this only has an effect with + * INCLUDE_BROKEN, since they are otherwise not included at all. + */ + DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2), + + /* + * Include root refs i.e. HEAD and pseudorefs along with the regular + * refs. + */ + DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3), +}; + +/* + * Return an iterator that goes over each reference in `refs` for + * which the refname begins with prefix. If trim is non-zero, then + * trim that many characters off the beginning of each refname. + * The output is ordered by refname. + */ +struct ref_iterator *refs_ref_iterator_begin( + struct ref_store *refs, + const char *prefix, const char **exclude_patterns, + int trim, enum do_for_each_ref_flags flags); + +/* + * Advance the iterator to the first or next item and return ITER_OK. + * If the iteration is exhausted, free the resources associated with + * the ref_iterator and return ITER_DONE. On errors, free the iterator + * resources and return ITER_ERROR. It is a bug to use ref_iterator or + * call this function again after it has returned ITER_DONE or + * ITER_ERROR. + */ +int ref_iterator_advance(struct ref_iterator *ref_iterator); + +/* + * Seek the iterator to the first reference with the given prefix. + * The prefix is matched as a literal string, without regard for path + * separators. If prefix is NULL or the empty string, seek the iterator to the + * first reference again. + * + * This function is expected to behave as if a new ref iterator with the same + * prefix had been created, but allows reuse of iterators and thus may allow + * the backend to optimize. Parameters other than the prefix that have been + * passed when creating the iterator will remain unchanged. + * + * Returns 0 on success, a negative error code otherwise. + */ +int ref_iterator_seek(struct ref_iterator *ref_iterator, + const char *prefix); + +/* + * If possible, peel the reference currently being viewed by the + * iterator. Return 0 on success. + */ +int ref_iterator_peel(struct ref_iterator *ref_iterator, + struct object_id *peeled); + +/* Free the reference iterator and any associated resources. */ +void ref_iterator_free(struct ref_iterator *ref_iterator); + +/* + * The common backend for the for_each_*ref* functions. Call fn for + * each reference in iter. If the iterator itself ever returns + * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop + * the iteration and return that value. Otherwise, return 0. In any + * case, free the iterator when done. This function is basically an + * adapter between the callback style of reference iteration and the + * iterator style. + */ +int do_for_each_ref_iterator(struct ref_iterator *iter, + each_ref_fn fn, void *cb_data); + #endif /* REFS_H */ diff --git a/refs/refs-internal.h b/refs/refs-internal.h index f86887085191e8..03f5df04d590a2 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -244,90 +244,8 @@ const char *find_descendant_ref(const char *dirname, #define SYMREF_MAXDEPTH 5 /* - * These flags are passed to refs_ref_iterator_begin() (and do_for_each_ref(), - * which feeds it). - */ -enum do_for_each_ref_flags { - /* - * Include broken references in a do_for_each_ref*() iteration, which - * would normally be omitted. This includes both refs that point to - * missing objects (a true repository corruption), ones with illegal - * names (which we prefer not to expose to callers), as well as - * dangling symbolic refs (i.e., those that point to a non-existent - * ref; this is not a corruption, but as they have no valid oid, we - * omit them from normal iteration results). - */ - DO_FOR_EACH_INCLUDE_BROKEN = (1 << 0), - - /* - * Only include per-worktree refs in a do_for_each_ref*() iteration. - * Normally this will be used with a files ref_store, since that's - * where all reference backends will presumably store their - * per-worktree refs. - */ - DO_FOR_EACH_PER_WORKTREE_ONLY = (1 << 1), - - /* - * Omit dangling symrefs from output; this only has an effect with - * INCLUDE_BROKEN, since they are otherwise not included at all. - */ - DO_FOR_EACH_OMIT_DANGLING_SYMREFS = (1 << 2), - - /* - * Include root refs i.e. HEAD and pseudorefs along with the regular - * refs. - */ - DO_FOR_EACH_INCLUDE_ROOT_REFS = (1 << 3), -}; - -/* - * Reference iterators - * - * A reference iterator encapsulates the state of an in-progress - * iteration over references. Create an instance of `struct - * ref_iterator` via one of the functions in this module. - * - * A freshly-created ref_iterator doesn't yet point at a reference. To - * advance the iterator, call ref_iterator_advance(). If successful, - * this sets the iterator's refname, oid, and flags fields to describe - * the next reference and returns ITER_OK. The data pointed at by - * refname and oid belong to the iterator; if you want to retain them - * after calling ref_iterator_advance() again or calling - * ref_iterator_free(), you must make a copy. When the iteration has - * been exhausted, ref_iterator_advance() releases any resources - * associated with the iteration, frees the ref_iterator object, and - * returns ITER_DONE. If you want to abort the iteration early, call - * ref_iterator_free(), which also frees the ref_iterator object and - * any associated resources. If there was an internal error advancing - * to the next entry, ref_iterator_advance() aborts the iteration, - * frees the ref_iterator, and returns ITER_ERROR. - * - * The reference currently being looked at can be peeled by calling - * ref_iterator_peel(). This function is often faster than peel_ref(), - * so it should be preferred when iterating over references. - * - * Putting it all together, a typical iteration looks like this: - * - * int ok; - * struct ref_iterator *iter = ...; - * - * while ((ok = ref_iterator_advance(iter)) == ITER_OK) { - * if (want_to_stop_iteration()) { - * ok = ITER_DONE; - * break; - * } - * - * // Access information about the current reference: - * if (!(iter->flags & REF_ISSYMREF)) - * printf("%s is %s\n", iter->refname, oid_to_hex(iter->oid)); - * - * // If you need to peel the reference: - * ref_iterator_peel(iter, &oid); - * } - * - * if (ok != ITER_DONE) - * handle_error(); - * ref_iterator_free(iter); + * Data structure for holding a reference iterator. See refs.h for + * more details and usage instructions. */ struct ref_iterator { struct ref_iterator_vtable *vtable; @@ -337,42 +255,6 @@ struct ref_iterator { unsigned int flags; }; -/* - * Advance the iterator to the first or next item and return ITER_OK. - * If the iteration is exhausted, free the resources associated with - * the ref_iterator and return ITER_DONE. On errors, free the iterator - * resources and return ITER_ERROR. It is a bug to use ref_iterator or - * call this function again after it has returned ITER_DONE or - * ITER_ERROR. - */ -int ref_iterator_advance(struct ref_iterator *ref_iterator); - -/* - * Seek the iterator to the first reference with the given prefix. - * The prefix is matched as a literal string, without regard for path - * separators. If prefix is NULL or the empty string, seek the iterator to the - * first reference again. - * - * This function is expected to behave as if a new ref iterator with the same - * prefix had been created, but allows reuse of iterators and thus may allow - * the backend to optimize. Parameters other than the prefix that have been - * passed when creating the iterator will remain unchanged. - * - * Returns 0 on success, a negative error code otherwise. - */ -int ref_iterator_seek(struct ref_iterator *ref_iterator, - const char *prefix); - -/* - * If possible, peel the reference currently being viewed by the - * iterator. Return 0 on success. - */ -int ref_iterator_peel(struct ref_iterator *ref_iterator, - struct object_id *peeled); - -/* Free the reference iterator and any associated resources. */ -void ref_iterator_free(struct ref_iterator *ref_iterator); - /* * An iterator over nothing (its first ref_iterator_advance() call * returns ITER_DONE). @@ -384,17 +266,6 @@ struct ref_iterator *empty_ref_iterator_begin(void); */ int is_empty_ref_iterator(struct ref_iterator *ref_iterator); -/* - * Return an iterator that goes over each reference in `refs` for - * which the refname begins with prefix. If trim is non-zero, then - * trim that many characters off the beginning of each refname. - * The output is ordered by refname. - */ -struct ref_iterator *refs_ref_iterator_begin( - struct ref_store *refs, - const char *prefix, const char **exclude_patterns, - int trim, enum do_for_each_ref_flags flags); - /* * A callback function used to instruct merge_ref_iterator how to * interleave the entries from iter0 and iter1. The function should @@ -520,18 +391,6 @@ struct ref_iterator_vtable { */ extern struct ref_iterator *current_ref_iter; -/* - * The common backend for the for_each_*ref* functions. Call fn for - * each reference in iter. If the iterator itself ever returns - * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop - * the iteration and return that value. Otherwise, return 0. In any - * case, free the iterator when done. This function is basically an - * adapter between the callback style of reference iteration and the - * iterator style. - */ -int do_for_each_ref_iterator(struct ref_iterator *iter, - each_ref_fn fn, void *cb_data); - struct ref_store; /* refs backends */ From 883a7ea0543426d0ecb172c72c8f706348961605 Mon Sep 17 00:00:00 2001 From: Karthik Nayak Date: Tue, 15 Jul 2025 13:28:27 +0200 Subject: [PATCH 03/19] ref-cache: remove unused function 'find_ref_entry()' The 'find_ref_entry' function is no longer used, so remove it. Signed-off-by: Karthik Nayak Signed-off-by: Junio C Hamano --- refs/ref-cache.c | 14 -------------- refs/ref-cache.h | 7 ------- 2 files changed, 21 deletions(-) diff --git a/refs/ref-cache.c b/refs/ref-cache.c index c1f1bab1d502dc..8aaffa8c6b81f0 100644 --- a/refs/ref-cache.c +++ b/refs/ref-cache.c @@ -194,20 +194,6 @@ static struct ref_dir *find_containing_dir(struct ref_dir *dir, return dir; } -struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname) -{ - int entry_index; - struct ref_entry *entry; - dir = find_containing_dir(dir, refname); - if (!dir) - return NULL; - entry_index = search_ref_dir(dir, refname, strlen(refname)); - if (entry_index == -1) - return NULL; - entry = dir->entries[entry_index]; - return (entry->flag & REF_DIR) ? NULL : entry; -} - /* * Emit a warning and return true iff ref1 and ref2 have the same name * and the same oid. Die if they have the same name but different diff --git a/refs/ref-cache.h b/refs/ref-cache.h index 5f04e518c3796d..f635d2d82436de 100644 --- a/refs/ref-cache.h +++ b/refs/ref-cache.h @@ -201,13 +201,6 @@ void free_ref_cache(struct ref_cache *cache); */ void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry); -/* - * Find the value entry with the given name in dir, sorting ref_dirs - * and recursing into subdirectories as necessary. If the name is not - * found or it corresponds to a directory entry, return NULL. - */ -struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname); - /* * Start iterating over references in `cache`. If `prefix` is * specified, only include references whose names start with that From 2b4648b9190b552e942d93363482bd617a510fc1 Mon Sep 17 00:00:00 2001 From: Karthik Nayak Date: Tue, 15 Jul 2025 13:28:28 +0200 Subject: [PATCH 04/19] refs: selectively set prefix in the seek functions The ref iterator exposes a `ref_iterator_seek()` function. The name suggests that this would seek the iterator to a specific reference in some ways similar to how `fseek()` works for the filesystem. However, the function actually sets the prefix for refs iteration. So further iteration would only yield references which match the particular prefix. This is a bit confusing. Let's add a 'flags' field to the function, which when set with the 'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the iteration in-line with the existing behavior. Otherwise, the reference backends will simply seek to the specified reference and clears any previously set prefix. This allows users to start iteration from a specific reference. In the packed and reftable backend, since references are available in a sorted list, the changes are simply setting the prefix if needed. The changes on the files-backend are a little more involved, since the files backend uses the 'ref-cache' mechanism. We move out the existing logic within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()` which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We then parse the provided seek string and set the required levels and their indexes to ensure that seeking is possible. Helped-by: Patrick Steinhardt Signed-off-by: Karthik Nayak Signed-off-by: Junio C Hamano --- refs.c | 6 +-- refs.h | 26 ++++++++----- refs/debug.c | 7 ++-- refs/files-backend.c | 7 ++-- refs/iterator.c | 26 +++++++------ refs/packed-backend.c | 17 ++++++--- refs/ref-cache.c | 85 ++++++++++++++++++++++++++++++++++++++--- refs/refs-internal.h | 7 ++-- refs/reftable-backend.c | 21 ++++++---- 9 files changed, 152 insertions(+), 50 deletions(-) diff --git a/refs.c b/refs.c index dce5c49ca2ba65..243e6898b80a9d 100644 --- a/refs.c +++ b/refs.c @@ -2666,12 +2666,12 @@ enum ref_transaction_error refs_verify_refnames_available(struct ref_store *refs if (!initial_transaction) { int ok; - if (!iter) { + if (!iter) iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0, DO_FOR_EACH_INCLUDE_BROKEN); - } else if (ref_iterator_seek(iter, dirname.buf) < 0) { + else if (ref_iterator_seek(iter, dirname.buf, + REF_ITERATOR_SEEK_SET_PREFIX) < 0) goto cleanup; - } while ((ok = ref_iterator_advance(iter)) == ITER_OK) { if (skip && diff --git a/refs.h b/refs.h index 7c21aaef3db414..e6780a8848bfc0 100644 --- a/refs.h +++ b/refs.h @@ -1299,21 +1299,29 @@ struct ref_iterator *refs_ref_iterator_begin( */ int ref_iterator_advance(struct ref_iterator *ref_iterator); +enum ref_iterator_seek_flag { + /* + * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is + * updated to match the provided string, affecting all subsequent iterations. If + * not, the iterator seeks to the specified reference and clears any previously + * set prefix. + */ + REF_ITERATOR_SEEK_SET_PREFIX = (1 << 0), +}; + /* - * Seek the iterator to the first reference with the given prefix. - * The prefix is matched as a literal string, without regard for path - * separators. If prefix is NULL or the empty string, seek the iterator to the + * Seek the iterator to the first reference matching the given seek string. + * The seek string is matched as a literal string, without regard for path + * separators. If seek is NULL or the empty string, seek the iterator to the * first reference again. * - * This function is expected to behave as if a new ref iterator with the same - * prefix had been created, but allows reuse of iterators and thus may allow - * the backend to optimize. Parameters other than the prefix that have been - * passed when creating the iterator will remain unchanged. + * This function is expected to behave as if a new ref iterator has been + * created, but allows reuse of existing iterators for optimization. * * Returns 0 on success, a negative error code otherwise. */ -int ref_iterator_seek(struct ref_iterator *ref_iterator, - const char *prefix); +int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *refname, + unsigned int flags); /* * If possible, peel the reference currently being viewed by the diff --git a/refs/debug.c b/refs/debug.c index 485e3079d7a3a7..da300efaf30973 100644 --- a/refs/debug.c +++ b/refs/debug.c @@ -170,12 +170,13 @@ static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator) } static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator, - const char *prefix) + const char *refname, unsigned int flags) { struct debug_ref_iterator *diter = (struct debug_ref_iterator *)ref_iterator; - int res = diter->iter->vtable->seek(diter->iter, prefix); - trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res); + int res = diter->iter->vtable->seek(diter->iter, refname, flags); + trace_printf_key(&trace_refs, "iterator_seek: %s flags: %d: %d\n", + refname ? refname : "", flags, res); return res; } diff --git a/refs/files-backend.c b/refs/files-backend.c index bf6f89b1d1938b..8b282f2a60a466 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -929,11 +929,11 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator) } static int files_ref_iterator_seek(struct ref_iterator *ref_iterator, - const char *prefix) + const char *refname, unsigned int flags) { struct files_ref_iterator *iter = (struct files_ref_iterator *)ref_iterator; - return ref_iterator_seek(iter->iter0, prefix); + return ref_iterator_seek(iter->iter0, refname, flags); } static int files_ref_iterator_peel(struct ref_iterator *ref_iterator, @@ -2316,7 +2316,8 @@ static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator) } static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED, - const char *prefix UNUSED) + const char *refname UNUSED, + unsigned int flags UNUSED) { BUG("ref_iterator_seek() called for reflog_iterator"); } diff --git a/refs/iterator.c b/refs/iterator.c index 766d96e795c9b9..17ef841d8a3013 100644 --- a/refs/iterator.c +++ b/refs/iterator.c @@ -15,10 +15,10 @@ int ref_iterator_advance(struct ref_iterator *ref_iterator) return ref_iterator->vtable->advance(ref_iterator); } -int ref_iterator_seek(struct ref_iterator *ref_iterator, - const char *prefix) +int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *refname, + unsigned int flags) { - return ref_iterator->vtable->seek(ref_iterator, prefix); + return ref_iterator->vtable->seek(ref_iterator, refname, flags); } int ref_iterator_peel(struct ref_iterator *ref_iterator, @@ -57,7 +57,8 @@ static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator UNUSED) } static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED, - const char *prefix UNUSED) + const char *refname UNUSED, + unsigned int flags UNUSED) { return 0; } @@ -224,7 +225,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator) } static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator, - const char *prefix) + const char *refname, unsigned int flags) { struct merge_ref_iterator *iter = (struct merge_ref_iterator *)ref_iterator; @@ -234,11 +235,11 @@ static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator, iter->iter0 = iter->iter0_owned; iter->iter1 = iter->iter1_owned; - ret = ref_iterator_seek(iter->iter0, prefix); + ret = ref_iterator_seek(iter->iter0, refname, flags); if (ret < 0) return ret; - ret = ref_iterator_seek(iter->iter1, prefix); + ret = ref_iterator_seek(iter->iter1, refname, flags); if (ret < 0) return ret; @@ -407,13 +408,16 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator) } static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator, - const char *prefix) + const char *refname, unsigned int flags) { struct prefix_ref_iterator *iter = (struct prefix_ref_iterator *)ref_iterator; - free(iter->prefix); - iter->prefix = xstrdup_or_null(prefix); - return ref_iterator_seek(iter->iter0, prefix); + + if (flags & REF_ITERATOR_SEEK_SET_PREFIX) { + free(iter->prefix); + iter->prefix = xstrdup_or_null(refname); + } + return ref_iterator_seek(iter->iter0, refname, flags); } static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator, diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 7fd73a0e6da3b5..5fa4ae6655ea0f 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -1004,19 +1004,23 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator) } static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator, - const char *prefix) + const char *refname, unsigned int flags) { struct packed_ref_iterator *iter = (struct packed_ref_iterator *)ref_iterator; const char *start; - if (prefix && *prefix) - start = find_reference_location(iter->snapshot, prefix, 0); + if (refname && *refname) + start = find_reference_location(iter->snapshot, refname, 0); else start = iter->snapshot->start; - free(iter->prefix); - iter->prefix = xstrdup_or_null(prefix); + /* Unset any previously set prefix */ + FREE_AND_NULL(iter->prefix); + + if (flags & REF_ITERATOR_SEEK_SET_PREFIX) + iter->prefix = xstrdup_or_null(refname); + iter->pos = start; iter->eof = iter->snapshot->eof; @@ -1194,7 +1198,8 @@ static struct ref_iterator *packed_ref_iterator_begin( iter->repo = ref_store->repo; iter->flags = flags; - if (packed_ref_iterator_seek(&iter->base, prefix) < 0) { + if (packed_ref_iterator_seek(&iter->base, prefix, + REF_ITERATOR_SEEK_SET_PREFIX) < 0) { ref_iterator_free(&iter->base); return NULL; } diff --git a/refs/ref-cache.c b/refs/ref-cache.c index 8aaffa8c6b81f0..1d95b56d4000bc 100644 --- a/refs/ref-cache.c +++ b/refs/ref-cache.c @@ -434,11 +434,9 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator) } } -static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator, - const char *prefix) +static int cache_ref_iterator_set_prefix(struct cache_ref_iterator *iter, + const char *prefix) { - struct cache_ref_iterator *iter = - (struct cache_ref_iterator *)ref_iterator; struct cache_ref_iterator_level *level; struct ref_dir *dir; @@ -469,6 +467,82 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator, return 0; } +static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator, + const char *refname, unsigned int flags) +{ + struct cache_ref_iterator *iter = + (struct cache_ref_iterator *)ref_iterator; + + if (flags & REF_ITERATOR_SEEK_SET_PREFIX) { + return cache_ref_iterator_set_prefix(iter, refname); + } else if (refname && *refname) { + struct cache_ref_iterator_level *level; + const char *slash = refname; + struct ref_dir *dir; + + dir = get_ref_dir(iter->cache->root); + + if (iter->prime_dir) + prime_ref_dir(dir, refname); + + iter->levels_nr = 1; + level = &iter->levels[0]; + level->index = -1; + level->dir = dir; + + /* Unset any previously set prefix */ + FREE_AND_NULL(iter->prefix); + + /* + * Breakdown the provided seek path and assign the correct + * indexing to each level as needed. + */ + do { + int len, idx; + int cmp = 0; + + sort_ref_dir(dir); + + slash = strchr(slash, '/'); + len = slash ? slash - refname : (int)strlen(refname); + + for (idx = 0; idx < dir->nr; idx++) { + cmp = strncmp(refname, dir->entries[idx]->name, len); + if (cmp <= 0) + break; + } + /* don't overflow the index */ + idx = idx >= dir->nr ? dir->nr - 1 : idx; + + if (slash) + slash = slash + 1; + + level->index = idx; + if (dir->entries[idx]->flag & REF_DIR) { + /* push down a level */ + dir = get_ref_dir(dir->entries[idx]); + + ALLOC_GROW(iter->levels, iter->levels_nr + 1, + iter->levels_alloc); + level = &iter->levels[iter->levels_nr++]; + level->dir = dir; + level->index = -1; + } else { + /* reduce the index so the leaf node is iterated over */ + if (cmp <= 0 && !slash) + level->index = idx - 1; + /* + * while the seek path may not be exhausted, our + * match is exhausted at a leaf node. + */ + break; + } + } while (slash); + } + + return 0; +} + static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator, struct object_id *peeled) { @@ -509,7 +583,8 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache, iter->cache = cache; iter->prime_dir = prime_dir; - if (cache_ref_iterator_seek(&iter->base, prefix) < 0) { + if (cache_ref_iterator_seek(&iter->base, prefix, + REF_ITERATOR_SEEK_SET_PREFIX) < 0) { ref_iterator_free(&iter->base); return NULL; } diff --git a/refs/refs-internal.h b/refs/refs-internal.h index 03f5df04d590a2..40c1c0f93dc8be 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -353,11 +353,12 @@ void base_ref_iterator_init(struct ref_iterator *iter, typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator); /* - * Seek the iterator to the first reference matching the given prefix. Should - * behave the same as if a new iterator was created with the same prefix. + * Seek the iterator to the first matching reference. If the + * REF_ITERATOR_SEEK_SET_PREFIX flag is set, it would behave the same as if a + * new iterator was created with the provided refname as prefix. */ typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator, - const char *prefix); + const char *refname, unsigned int flags); /* * Peels the current ref, returning 0 for success or -1 for failure. diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 4c3817f4ec1a88..c3d48cc4120fe2 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -719,15 +719,20 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator) } static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator, - const char *prefix) + const char *refname, unsigned int flags) { struct reftable_ref_iterator *iter = (struct reftable_ref_iterator *)ref_iterator; - free(iter->prefix); - iter->prefix = xstrdup_or_null(prefix); - iter->prefix_len = prefix ? strlen(prefix) : 0; - iter->err = reftable_iterator_seek_ref(&iter->iter, prefix); + /* Unset any previously set prefix */ + FREE_AND_NULL(iter->prefix); + iter->prefix_len = 0; + + if (flags & REF_ITERATOR_SEEK_SET_PREFIX) { + iter->prefix = xstrdup_or_null(refname); + iter->prefix_len = refname ? strlen(refname) : 0; + } + iter->err = reftable_iterator_seek_ref(&iter->iter, refname); return iter->err; } @@ -839,7 +844,8 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_ if (ret) goto done; - ret = reftable_ref_iterator_seek(&iter->base, prefix); + ret = reftable_ref_iterator_seek(&iter->base, prefix, + REF_ITERATOR_SEEK_SET_PREFIX); if (ret) goto done; @@ -2042,7 +2048,8 @@ static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator) } static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED, - const char *prefix UNUSED) + const char *refname UNUSED, + unsigned int flags UNUSED) { BUG("reftable reflog iterator cannot be seeked"); return -1; From 526530a16a3c345643afb324107b4a216b8d37ff Mon Sep 17 00:00:00 2001 From: Karthik Nayak Date: Tue, 15 Jul 2025 13:28:29 +0200 Subject: [PATCH 05/19] ref-filter: remove unnecessary else clause In 'ref-filter.c', there is an 'else' clause within `do_filter_refs()`. This is unnecessary since the 'if' clause calls `die()`, which would exit the program. So let's remove the unnecessary 'else' clause. This improves readability since the indentation is also reduced and flow is simpler. Signed-off-by: Karthik Nayak Signed-off-by: Junio C Hamano --- ref-filter.c | 60 ++++++++++++++++++++++++++-------------------------- 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/ref-filter.c b/ref-filter.c index 7a274633cfc7e7..da663c7ac894ca 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -3199,37 +3199,37 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref /* Simple per-ref filtering */ if (!filter->kind) die("filter_refs: invalid type"); - else { - /* - * For common cases where we need only branches or remotes or tags, - * we only iterate through those refs. If a mix of refs is needed, - * we iterate over all refs and filter out required refs with the help - * of filter_ref_kind(). - */ - if (filter->kind == FILTER_REFS_BRANCHES) - ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), - "refs/heads/", NULL, - fn, cb_data); - else if (filter->kind == FILTER_REFS_REMOTES) - ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), - "refs/remotes/", NULL, - fn, cb_data); - else if (filter->kind == FILTER_REFS_TAGS) - ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), - "refs/tags/", NULL, fn, - cb_data); - else if (filter->kind & FILTER_REFS_REGULAR) - ret = for_each_fullref_in_pattern(filter, fn, cb_data); - /* - * When printing all ref types, HEAD is already included, - * so we don't want to print HEAD again. - */ - if (!ret && !(filter->kind & FILTER_REFS_ROOT_REFS) && - (filter->kind & FILTER_REFS_DETACHED_HEAD)) - refs_head_ref(get_main_ref_store(the_repository), fn, - cb_data); - } + /* + * For common cases where we need only branches or remotes or tags, + * we only iterate through those refs. If a mix of refs is needed, + * we iterate over all refs and filter out required refs with the help + * of filter_ref_kind(). + */ + if (filter->kind == FILTER_REFS_BRANCHES) + ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), + "refs/heads/", NULL, + fn, cb_data); + else if (filter->kind == FILTER_REFS_REMOTES) + ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), + "refs/remotes/", NULL, + fn, cb_data); + else if (filter->kind == FILTER_REFS_TAGS) + ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), + "refs/tags/", NULL, fn, + cb_data); + else if (filter->kind & FILTER_REFS_REGULAR) + ret = for_each_fullref_in_pattern(filter, fn, cb_data); + + /* + * When printing all ref types, HEAD is already included, + * so we don't want to print HEAD again. + */ + if (!ret && !(filter->kind & FILTER_REFS_ROOT_REFS) && + (filter->kind & FILTER_REFS_DETACHED_HEAD)) + refs_head_ref(get_main_ref_store(the_repository), fn, + cb_data); + clear_contains_cache(&filter->internal.contains_cache); clear_contains_cache(&filter->internal.no_contains_cache); From dabecb9db2b25a32d72c90832f338849b665fdf9 Mon Sep 17 00:00:00 2001 From: Karthik Nayak Date: Tue, 15 Jul 2025 13:28:30 +0200 Subject: [PATCH 06/19] for-each-ref: introduce a '--start-after' option The `git-for-each-ref(1)` command is used to iterate over references present in a repository. In large repositories with millions of references, it would be optimal to paginate this output such that we can start iteration from a given reference. This would avoid having to iterate over all references from the beginning each time when paginating through results. The previous commit added 'seek' functionality to the reference backends. Utilize this and expose a '--start-after' option in 'git-for-each-ref(1)'. When used, the reference iteration seeks to the lexicographically next reference and iterates from there onward. This enables efficient pagination workflows, where the calling script can remember the last provided reference and use that as the starting point for the next set of references: git for-each-ref --count=100 git for-each-ref --count=100 --start-after=refs/heads/branch-100 git for-each-ref --count=100 --start-after=refs/heads/branch-200 Since the reference iterators only allow seeking to a specified marker via the `ref_iterator_seek()`, we introduce a helper function `start_ref_iterator_after()`, which seeks to next reference by simply adding (char) 1 to the marker. We must note that pagination always continues from the provided marker, as such any concurrent reference updates lexicographically behind the marker will not be output. Document the same. Signed-off-by: Karthik Nayak Signed-off-by: Junio C Hamano --- Documentation/git-for-each-ref.adoc | 10 +- builtin/for-each-ref.c | 8 ++ ref-filter.c | 78 ++++++++--- ref-filter.h | 1 + t/t6302-for-each-ref-filter.sh | 194 ++++++++++++++++++++++++++++ 5 files changed, 272 insertions(+), 19 deletions(-) diff --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc index 5ef89fc0fe3c9d..ae61ba642a6389 100644 --- a/Documentation/git-for-each-ref.adoc +++ b/Documentation/git-for-each-ref.adoc @@ -14,7 +14,7 @@ SYNOPSIS [--points-at=] [--merged[=]] [--no-merged[=]] [--contains[=]] [--no-contains[=]] - [--exclude= ...] + [--exclude= ...] [--start-after=] DESCRIPTION ----------- @@ -108,6 +108,14 @@ TAB %(refname)`. --include-root-refs:: List root refs (HEAD and pseudorefs) apart from regular refs. +--start-after=:: + Allows paginating the output by skipping references up to and including the + specified marker. When paging, it should be noted that references may be + deleted, modified or added between invocations. Output will only yield those + references which follow the marker lexicographically. Output begins from the + first reference that would come after the marker alphabetically. Cannot be + used with general pattern matching or custom sort options. + FIELD NAMES ----------- diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c index 3d2207ec7733b2..3f21598046a758 100644 --- a/builtin/for-each-ref.c +++ b/builtin/for-each-ref.c @@ -13,6 +13,7 @@ static char const * const for_each_ref_usage[] = { N_("git for-each-ref [--points-at ]"), N_("git for-each-ref [--merged []] [--no-merged []]"), N_("git for-each-ref [--contains []] [--no-contains []]"), + N_("git for-each-ref [--start-after ]"), NULL }; @@ -44,6 +45,7 @@ int cmd_for_each_ref(int argc, OPT_GROUP(""), OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only matched refs")), OPT_STRING( 0 , "format", &format.format, N_("format"), N_("format to use for the output")), + OPT_STRING( 0 , "start-after", &filter.start_after, N_("start-start"), N_("start iteration after the provided marker")), OPT__COLOR(&format.use_color, N_("respect format colors")), OPT_REF_FILTER_EXCLUDE(&filter), OPT_REF_SORT(&sorting_options), @@ -79,6 +81,9 @@ int cmd_for_each_ref(int argc, if (verify_ref_format(&format)) usage_with_options(for_each_ref_usage, opts); + if (filter.start_after && sorting_options.nr > 1) + die(_("cannot use --start-after with custom sort options")); + sorting = ref_sorting_options(&sorting_options); ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase); filter.ignore_case = icase; @@ -100,6 +105,9 @@ int cmd_for_each_ref(int argc, filter.name_patterns = argv; } + if (filter.start_after && filter.name_patterns && filter.name_patterns[0]) + die(_("cannot use --start-after with patterns")); + if (include_root_refs) flags |= FILTER_REFS_ROOT_REFS | FILTER_REFS_DETACHED_HEAD; diff --git a/ref-filter.c b/ref-filter.c index da663c7ac894ca..c8a6b7f1aff470 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -2683,6 +2683,41 @@ static int filter_exclude_match(struct ref_filter *filter, const char *refname) return match_pattern(filter->exclude.v, refname, filter->ignore_case); } +/* + * We need to seek to the reference right after a given marker but excluding any + * matching references. So we seek to the lexicographically next reference. + */ +static int start_ref_iterator_after(struct ref_iterator *iter, const char *marker) +{ + struct strbuf sb = STRBUF_INIT; + int ret; + + strbuf_addstr(&sb, marker); + strbuf_addch(&sb, 1); + + ret = ref_iterator_seek(iter, sb.buf, 0); + + strbuf_release(&sb); + return ret; +} + +static int for_each_fullref_with_seek(struct ref_filter *filter, each_ref_fn cb, + void *cb_data, unsigned int flags) +{ + struct ref_iterator *iter; + int ret = 0; + + iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "", + NULL, 0, flags); + if (filter->start_after) + ret = start_ref_iterator_after(iter, filter->start_after); + + if (ret) + return ret; + + return do_for_each_ref_iterator(iter, cb, cb_data); +} + /* * This is the same as for_each_fullref_in(), but it tries to iterate * only over the patterns we'll care about. Note that it _doesn't_ do a full @@ -2694,8 +2729,8 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter, { if (filter->kind & FILTER_REFS_ROOT_REFS) { /* In this case, we want to print all refs including root refs. */ - return refs_for_each_include_root_refs(get_main_ref_store(the_repository), - cb, cb_data); + return for_each_fullref_with_seek(filter, cb, cb_data, + DO_FOR_EACH_INCLUDE_ROOT_REFS); } if (!filter->match_as_path) { @@ -2704,8 +2739,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter, * prefixes like "refs/heads/" etc. are stripped off, * so we have to look at everything: */ - return refs_for_each_fullref_in(get_main_ref_store(the_repository), - "", NULL, cb, cb_data); + return for_each_fullref_with_seek(filter, cb, cb_data, 0); } if (filter->ignore_case) { @@ -2714,14 +2748,12 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter, * so just return everything and let the caller * sort it out. */ - return refs_for_each_fullref_in(get_main_ref_store(the_repository), - "", NULL, cb, cb_data); + return for_each_fullref_with_seek(filter, cb, cb_data, 0); } if (!filter->name_patterns[0]) { /* no patterns; we have to look at everything */ - return refs_for_each_fullref_in(get_main_ref_store(the_repository), - "", filter->exclude.v, cb, cb_data); + return for_each_fullref_with_seek(filter, cb, cb_data, 0); } return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository), @@ -3189,6 +3221,7 @@ void filter_is_base(struct repository *r, static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data) { + const char *prefix = NULL; int ret = 0; filter->kind = type & FILTER_REFS_KIND_MASK; @@ -3207,19 +3240,28 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref * of filter_ref_kind(). */ if (filter->kind == FILTER_REFS_BRANCHES) - ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), - "refs/heads/", NULL, - fn, cb_data); + prefix = "refs/heads/"; else if (filter->kind == FILTER_REFS_REMOTES) - ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), - "refs/remotes/", NULL, - fn, cb_data); + prefix = "refs/remotes/"; else if (filter->kind == FILTER_REFS_TAGS) - ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), - "refs/tags/", NULL, fn, - cb_data); - else if (filter->kind & FILTER_REFS_REGULAR) + prefix = "refs/tags/"; + + if (prefix) { + struct ref_iterator *iter; + + iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), + "", NULL, 0, 0); + + if (filter->start_after) + ret = start_ref_iterator_after(iter, filter->start_after); + else if (prefix) + ret = ref_iterator_seek(iter, prefix, 1); + + if (!ret) + ret = do_for_each_ref_iterator(iter, fn, cb_data); + } else if (filter->kind & FILTER_REFS_REGULAR) { ret = for_each_fullref_in_pattern(filter, fn, cb_data); + } /* * When printing all ref types, HEAD is already included, diff --git a/ref-filter.h b/ref-filter.h index c98c4fbd4c150d..f22ca94b49df7c 100644 --- a/ref-filter.h +++ b/ref-filter.h @@ -64,6 +64,7 @@ struct ref_array { struct ref_filter { const char **name_patterns; + const char *start_after; struct strvec exclude; struct oid_array points_at; struct commit_list *with_commit; diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh index bb02b86c163559..e097db6b02f0c4 100755 --- a/t/t6302-for-each-ref-filter.sh +++ b/t/t6302-for-each-ref-filter.sh @@ -541,4 +541,198 @@ test_expect_success 'validate worktree atom' ' test_cmp expect actual ' +test_expect_success 'start after with empty value' ' + cat >expect <<-\EOF && + refs/heads/main + refs/heads/main_worktree + refs/heads/side + refs/odd/spot + refs/tags/annotated-tag + refs/tags/doubly-annotated-tag + refs/tags/doubly-signed-tag + refs/tags/foo1.10 + refs/tags/foo1.3 + refs/tags/foo1.6 + refs/tags/four + refs/tags/one + refs/tags/signed-tag + refs/tags/three + refs/tags/two + EOF + git for-each-ref --format="%(refname)" --start-after="" >actual && + test_cmp expect actual +' + +test_expect_success 'start after a specific reference' ' + cat >expect <<-\EOF && + refs/tags/annotated-tag + refs/tags/doubly-annotated-tag + refs/tags/doubly-signed-tag + refs/tags/foo1.10 + refs/tags/foo1.3 + refs/tags/foo1.6 + refs/tags/four + refs/tags/one + refs/tags/signed-tag + refs/tags/three + refs/tags/two + EOF + git for-each-ref --format="%(refname)" --start-after=refs/odd/spot >actual && + test_cmp expect actual +' + +test_expect_success 'start after a specific reference with partial match' ' + cat >expect <<-\EOF && + refs/odd/spot + refs/tags/annotated-tag + refs/tags/doubly-annotated-tag + refs/tags/doubly-signed-tag + refs/tags/foo1.10 + refs/tags/foo1.3 + refs/tags/foo1.6 + refs/tags/four + refs/tags/one + refs/tags/signed-tag + refs/tags/three + refs/tags/two + EOF + git for-each-ref --format="%(refname)" --start-after=refs/odd/sp >actual && + test_cmp expect actual +' + +test_expect_success 'start after, just behind a specific reference' ' + cat >expect <<-\EOF && + refs/odd/spot + refs/tags/annotated-tag + refs/tags/doubly-annotated-tag + refs/tags/doubly-signed-tag + refs/tags/foo1.10 + refs/tags/foo1.3 + refs/tags/foo1.6 + refs/tags/four + refs/tags/one + refs/tags/signed-tag + refs/tags/three + refs/tags/two + EOF + git for-each-ref --format="%(refname)" --start-after=refs/odd/parrot >actual && + test_cmp expect actual +' + +test_expect_success 'start after with specific directory match' ' + cat >expect <<-\EOF && + refs/odd/spot + refs/tags/annotated-tag + refs/tags/doubly-annotated-tag + refs/tags/doubly-signed-tag + refs/tags/foo1.10 + refs/tags/foo1.3 + refs/tags/foo1.6 + refs/tags/four + refs/tags/one + refs/tags/signed-tag + refs/tags/three + refs/tags/two + EOF + git for-each-ref --format="%(refname)" --start-after=refs/odd >actual && + test_cmp expect actual +' + +test_expect_success 'start after with specific directory and trailing slash' ' + cat >expect <<-\EOF && + refs/odd/spot + refs/tags/annotated-tag + refs/tags/doubly-annotated-tag + refs/tags/doubly-signed-tag + refs/tags/foo1.10 + refs/tags/foo1.3 + refs/tags/foo1.6 + refs/tags/four + refs/tags/one + refs/tags/signed-tag + refs/tags/three + refs/tags/two + EOF + git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual && + test_cmp expect actual +' + +test_expect_success 'start after, just behind a specific directory' ' + cat >expect <<-\EOF && + refs/odd/spot + refs/tags/annotated-tag + refs/tags/doubly-annotated-tag + refs/tags/doubly-signed-tag + refs/tags/foo1.10 + refs/tags/foo1.3 + refs/tags/foo1.6 + refs/tags/four + refs/tags/one + refs/tags/signed-tag + refs/tags/three + refs/tags/two + EOF + git for-each-ref --format="%(refname)" --start-after=refs/lost >actual && + test_cmp expect actual +' + +test_expect_success 'start after, overflow specific reference length' ' + cat >expect <<-\EOF && + refs/tags/annotated-tag + refs/tags/doubly-annotated-tag + refs/tags/doubly-signed-tag + refs/tags/foo1.10 + refs/tags/foo1.3 + refs/tags/foo1.6 + refs/tags/four + refs/tags/one + refs/tags/signed-tag + refs/tags/three + refs/tags/two + EOF + git for-each-ref --format="%(refname)" --start-after=refs/odd/spotnew >actual && + test_cmp expect actual +' + +test_expect_success 'start after, overflow specific reference path' ' + cat >expect <<-\EOF && + refs/tags/annotated-tag + refs/tags/doubly-annotated-tag + refs/tags/doubly-signed-tag + refs/tags/foo1.10 + refs/tags/foo1.3 + refs/tags/foo1.6 + refs/tags/four + refs/tags/one + refs/tags/signed-tag + refs/tags/three + refs/tags/two + EOF + git for-each-ref --format="%(refname)" --start-after=refs/odd/spot/new >actual && + test_cmp expect actual +' + +test_expect_success 'start after, last reference' ' + cat >expect <<-\EOF && + EOF + git for-each-ref --format="%(refname)" --start-after=refs/tags/two >actual && + test_cmp expect actual +' + +test_expect_success 'start after used with a pattern' ' + cat >expect <<-\EOF && + fatal: cannot use --start-after with patterns + EOF + test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot refs/tags 2>actual && + test_cmp expect actual +' + +test_expect_success 'start after used with custom sort order' ' + cat >expect <<-\EOF && + fatal: cannot use --start-after with custom sort options + EOF + test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot --sort=author 2>actual && + test_cmp expect actual +' + test_done From 4d8be89d973b69a826911385c5cf3d40d347394b Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Tue, 15 Jul 2025 13:29:18 +0200 Subject: [PATCH 07/19] midx: start tracking per object database source Multi-pack indices are tracked via `struct multi_pack_index`. This data structure is stored as a linked list inside `struct object_database`, which is the global database that spans across all of the object sources. This layout causes two problems: - Object databases consist of multiple object sources (e.g. one source per alternate object directory), where each multi-pack index is specific to one of those sources. Regardless of that though, the MIDX is not tracked per source, but tracked globally for the whole object database. This creates a mismatch between the on-disk layout and how things are organized in the object database subsystems and makes some parts, like figuring out whether a source has an MIDX, quite awkward. - Multi-pack indices are an implementation detail of how efficient access for packfiles work. As such, they are neither relevant in the context of loose objects, nor in a potential future where we have pluggable backends. Refactor `prepare_multi_pack_index_one()` so that it works on a specific source, which allows us to easily store a pointer to the multi-pack index inside of it. For now, this pointer exists next to the existing linked list we have in the object database. Users will be adjusted in subsequent patches to instead use the per-source pointers. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- midx.c | 19 +++++++++++-------- midx.h | 3 ++- odb.h | 9 ++++++++- packfile.c | 4 +++- 4 files changed, 24 insertions(+), 11 deletions(-) diff --git a/midx.c b/midx.c index 3c5bc8217300ca..2f64c26058fa48 100644 --- a/midx.c +++ b/midx.c @@ -724,28 +724,29 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id) return 0; } -int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local) +int prepare_multi_pack_index_one(struct odb_source *source, int local) { + struct repository *r = source->odb->repo; struct multi_pack_index *m; - struct multi_pack_index *m_search; prepare_repo_settings(r); if (!r->settings.core_multi_pack_index) return 0; - for (m_search = r->objects->multi_pack_index; m_search; m_search = m_search->next) - if (!strcmp(object_dir, m_search->object_dir)) - return 1; - - m = load_multi_pack_index(r, object_dir, local); + if (source->midx) + return 1; + m = load_multi_pack_index(r, source->path, local); if (m) { struct multi_pack_index *mp = r->objects->multi_pack_index; if (mp) { m->next = mp->next; mp->next = m; - } else + } else { r->objects->multi_pack_index = m; + } + source->midx = m; + return 1; } @@ -837,6 +838,8 @@ void clear_midx_file(struct repository *r) if (r->objects && r->objects->multi_pack_index) { close_midx(r->objects->multi_pack_index); r->objects->multi_pack_index = NULL; + for (struct odb_source *source = r->objects->sources; source; source = source->next) + source->midx = NULL; } if (remove_path(midx.buf)) diff --git a/midx.h b/midx.h index 9d1374cbd58d01..639a6f50e45d95 100644 --- a/midx.h +++ b/midx.h @@ -8,6 +8,7 @@ struct pack_entry; struct repository; struct bitmapped_pack; struct git_hash_algo; +struct odb_source; #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ #define MIDX_VERSION 1 @@ -123,7 +124,7 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name); int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id); -int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local); +int prepare_multi_pack_index_one(struct odb_source *source, int local); /* * Variant of write_midx_file which writes a MIDX containing only the packs diff --git a/odb.h b/odb.h index e922f256802a0c..f09dba1fe1dd3a 100644 --- a/odb.h +++ b/odb.h @@ -13,6 +13,7 @@ struct oidmap; struct oidtree; struct strbuf; struct repository; +struct multi_pack_index; /* * Compute the exact path an alternate is at and returns it. In case of @@ -55,6 +56,13 @@ struct odb_source { /* Map between object IDs for loose objects. */ struct loose_object_map *loose_map; + /* + * private data + * + * should only be accessed directly by packfile.c and midx.c + */ + struct multi_pack_index *midx; + /* * This is a temporary object store created by the tmp_objdir * facility. Disable ref updates since the objects in the store @@ -75,7 +83,6 @@ struct odb_source { }; struct packed_git; -struct multi_pack_index; struct cached_object_entry; /* diff --git a/packfile.c b/packfile.c index af9ccfdba624ab..8bdd85fc7e7441 100644 --- a/packfile.c +++ b/packfile.c @@ -372,6 +372,8 @@ void close_object_store(struct object_database *o) if (o->multi_pack_index) { close_midx(o->multi_pack_index); o->multi_pack_index = NULL; + for (struct odb_source *source = o->sources; source; source = source->next) + source->midx = NULL; } close_commit_graph(o); @@ -1037,7 +1039,7 @@ static void prepare_packed_git(struct repository *r) odb_prepare_alternates(r->objects); for (source = r->objects->sources; source; source = source->next) { int local = (source == r->objects->sources); - prepare_multi_pack_index_one(r, source->path, local); + prepare_multi_pack_index_one(source, local); prepare_packed_git_one(r, source->path, local); } rearrange_packed_git(r); From ec4380f446b76e931002481f3e4be520c77058b6 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Tue, 15 Jul 2025 13:29:19 +0200 Subject: [PATCH 08/19] packfile: refactor `prepare_packed_git_one()` to work on sources In the preceding commit we refactored how we load multi-pack indices to take a corresponding "source" as input. As part of this refactoring we started to store a pointer to the MIDX in `struct odb_source` itself. Refactor loading of packfiles in the same way: instead of passing in the object directory, we now pass in the source from which we want to load packfiles. This allows us to simplify the code because we don't have to search for a corresponding MIDX anymore, but we can instead directly use the MIDX that we have already prepared beforehand. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- packfile.c | 23 +++++++++-------------- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/packfile.c b/packfile.c index 8bdd85fc7e7441..0b3142973b6252 100644 --- a/packfile.c +++ b/packfile.c @@ -935,22 +935,17 @@ static void prepare_pack(const char *full_name, size_t full_name_len, report_garbage(PACKDIR_FILE_GARBAGE, full_name); } -static void prepare_packed_git_one(struct repository *r, char *objdir, int local) +static void prepare_packed_git_one(struct odb_source *source, int local) { - struct prepare_pack_data data; struct string_list garbage = STRING_LIST_INIT_DUP; + struct prepare_pack_data data = { + .m = source->midx, + .r = source->odb->repo, + .garbage = &garbage, + .local = local, + }; - data.m = r->objects->multi_pack_index; - - /* look for the multi-pack-index for this object directory */ - while (data.m && strcmp(data.m->object_dir, objdir)) - data.m = data.m->next; - - data.r = r; - data.garbage = &garbage; - data.local = local; - - for_each_file_in_pack_dir(objdir, prepare_pack, &data); + for_each_file_in_pack_dir(source->path, prepare_pack, &data); report_pack_garbage(data.garbage); string_list_clear(data.garbage, 0); @@ -1040,7 +1035,7 @@ static void prepare_packed_git(struct repository *r) for (source = r->objects->sources; source; source = source->next) { int local = (source == r->objects->sources); prepare_multi_pack_index_one(source, local); - prepare_packed_git_one(r, source->path, local); + prepare_packed_git_one(source, local); } rearrange_packed_git(r); From 6567432ab4f93f63cd2111197c91651a9b04c517 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Tue, 15 Jul 2025 13:29:20 +0200 Subject: [PATCH 09/19] midx: stop using linked list when closing MIDX When calling `close_midx()` we not only close the multi-pack index for one object source, but instead we iterate through the whole linked list of MIDXs to close all of them. This linked list is about to go away in favor of using the new per-source pointer to its respective MIDX. Refactor the function to iterate through sources instead. Note that after this patch, there's a couple of callsites left that continue to use `close_midx()` without iterating through all sources. These are all cases where we don't care about the MIDX from other sources though, so it's fine to keep them as-is. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- midx.c | 13 ++++++++----- packfile.c | 11 ++++++----- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/midx.c b/midx.c index 2f64c26058fa48..472d6bf17ab922 100644 --- a/midx.c +++ b/midx.c @@ -401,7 +401,6 @@ void close_midx(struct multi_pack_index *m) if (!m) return; - close_midx(m->next); close_midx(m->base_midx); munmap((unsigned char *)m->data, m->data_len); @@ -835,11 +834,15 @@ void clear_midx_file(struct repository *r) get_midx_filename(r->hash_algo, &midx, r->objects->sources->path); - if (r->objects && r->objects->multi_pack_index) { - close_midx(r->objects->multi_pack_index); - r->objects->multi_pack_index = NULL; - for (struct odb_source *source = r->objects->sources; source; source = source->next) + if (r->objects) { + struct odb_source *source; + + for (source = r->objects->sources; source; source = source->next) { + if (source->midx) + close_midx(source->midx); source->midx = NULL; + } + r->objects->multi_pack_index = NULL; } if (remove_path(midx.buf)) diff --git a/packfile.c b/packfile.c index 0b3142973b6252..7b350f018ca78c 100644 --- a/packfile.c +++ b/packfile.c @@ -361,6 +361,7 @@ void close_pack(struct packed_git *p) void close_object_store(struct object_database *o) { + struct odb_source *source; struct packed_git *p; for (p = o->packed_git; p; p = p->next) @@ -369,12 +370,12 @@ void close_object_store(struct object_database *o) else close_pack(p); - if (o->multi_pack_index) { - close_midx(o->multi_pack_index); - o->multi_pack_index = NULL; - for (struct odb_source *source = o->sources; source; source = source->next) - source->midx = NULL; + for (source = o->sources; source; source = source->next) { + if (source->midx) + close_midx(source->midx); + source->midx = NULL; } + o->multi_pack_index = NULL; close_commit_graph(o); } From 736bb725ebcd37d567455db2ac50524dea11223c Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Tue, 15 Jul 2025 13:29:21 +0200 Subject: [PATCH 10/19] packfile: refactor `get_multi_pack_index()` to work on sources The function `get_multi_pack_index()` loads multi-pack indices via `prepare_packed_git()` and then returns the linked list of multi-pack indices that is stored in `struct object_database`. That list is in the process of being removed though in favor of storing the MIDX as part of the object database source it belongs to. Refactor `get_multi_pack_index()` so that it returns the multi-pack index for a single object source. Callers are now expected to call this function for each source they are interested in. This requires them to iterate through alternates, so we have to prepare alternate object sources before doing so. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- builtin/pack-objects.c | 10 +++++++--- builtin/repack.c | 10 +++++----- midx-write.c | 22 ++-------------------- object-name.c | 22 +++++++++++++++------- pack-bitmap.c | 21 +++++++++++++++------ packfile.c | 31 ++++++++++++------------------- packfile.h | 3 +-- 7 files changed, 57 insertions(+), 62 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 067b9e322a9db4..3dd84495b869e0 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1706,8 +1706,8 @@ static int want_object_in_pack_mtime(const struct object_id *oid, uint32_t found_mtime) { int want; + struct odb_source *source; struct list_head *pos; - struct multi_pack_index *m; if (!exclude && local && has_loose_object_nonlocal(oid)) return 0; @@ -1727,9 +1727,13 @@ static int want_object_in_pack_mtime(const struct object_id *oid, *found_offset = 0; } - for (m = get_multi_pack_index(the_repository); m; m = m->next) { + odb_prepare_alternates(the_repository->objects); + + for (source = the_repository->objects->sources; source; source = source->next) { + struct multi_pack_index *m = get_multi_pack_index(source); struct pack_entry e; - if (fill_midx_entry(the_repository, oid, &e, m)) { + + if (m && fill_midx_entry(the_repository, oid, &e, m)) { want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset, found_mtime); if (want != -1) return want; diff --git a/builtin/repack.c b/builtin/repack.c index 5e89d96df1373d..d63e1a9fec2fd6 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -223,9 +223,9 @@ static void mark_packs_for_deletion(struct existing_packs *existing, static void remove_redundant_pack(const char *dir_name, const char *base_name) { struct strbuf buf = STRBUF_INIT; - struct multi_pack_index *m = get_local_multi_pack_index(the_repository); + struct multi_pack_index *m = get_multi_pack_index(the_repository->objects->sources); strbuf_addf(&buf, "%s.pack", base_name); - if (m && midx_contains_pack(m, buf.buf)) + if (m && m->local && midx_contains_pack(m, buf.buf)) clear_midx_file(the_repository); strbuf_insertf(&buf, 0, "%s/", dir_name); unlink_pack_path(buf.buf, 1); @@ -1531,7 +1531,7 @@ int cmd_repack(int argc, * midx_has_unknown_packs() will make the decision for * us. */ - if (!get_local_multi_pack_index(the_repository)) + if (!get_multi_pack_index(the_repository->objects->sources)) midx_must_contain_cruft = 1; } @@ -1614,9 +1614,9 @@ int cmd_repack(int argc, string_list_sort(&names); - if (get_local_multi_pack_index(the_repository)) { + if (get_multi_pack_index(the_repository->objects->sources)) { struct multi_pack_index *m = - get_local_multi_pack_index(the_repository); + get_multi_pack_index(the_repository->objects->sources); ALLOC_ARRAY(midx_pack_names, m->num_packs + m->num_packs_in_base); diff --git a/midx-write.c b/midx-write.c index f2cfb85476ed78..c1ae62d3549425 100644 --- a/midx-write.c +++ b/midx-write.c @@ -916,26 +916,8 @@ static int write_midx_bitmap(struct write_midx_context *ctx, static struct multi_pack_index *lookup_multi_pack_index(struct repository *r, const char *object_dir) { - struct multi_pack_index *result = NULL; - struct multi_pack_index *cur; - char *obj_dir_real = real_pathdup(object_dir, 1); - struct strbuf cur_path_real = STRBUF_INIT; - - /* Ensure the given object_dir is local, or a known alternate. */ - odb_find_source(r->objects, obj_dir_real); - - for (cur = get_multi_pack_index(r); cur; cur = cur->next) { - strbuf_realpath(&cur_path_real, cur->object_dir, 1); - if (!strcmp(obj_dir_real, cur_path_real.buf)) { - result = cur; - goto cleanup; - } - } - -cleanup: - free(obj_dir_real); - strbuf_release(&cur_path_real); - return result; + struct odb_source *source = odb_find_source(r->objects, object_dir); + return get_multi_pack_index(source); } static int fill_packs_from_midx(struct write_midx_context *ctx, diff --git a/object-name.c b/object-name.c index ddafe7f9b13a96..27138b55b47556 100644 --- a/object-name.c +++ b/object-name.c @@ -198,16 +198,20 @@ static void unique_in_pack(struct packed_git *p, static void find_short_packed_object(struct disambiguate_state *ds) { - struct multi_pack_index *m; + struct odb_source *source; struct packed_git *p; /* Skip, unless oids from the storage hash algorithm are wanted */ if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo)) return; - for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous; - m = m->next) - unique_in_midx(m, ds); + odb_prepare_alternates(ds->repo->objects); + for (source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) { + struct multi_pack_index *m = get_multi_pack_index(source); + if (m) + unique_in_midx(m, ds); + } + for (p = get_packed_git(ds->repo); p && !ds->ambiguous; p = p->next) unique_in_pack(p, ds); @@ -792,11 +796,15 @@ static void find_abbrev_len_for_pack(struct packed_git *p, static void find_abbrev_len_packed(struct min_abbrev_data *mad) { - struct multi_pack_index *m; struct packed_git *p; - for (m = get_multi_pack_index(mad->repo); m; m = m->next) - find_abbrev_len_for_midx(m, mad); + odb_prepare_alternates(mad->repo->objects); + for (struct odb_source *source = mad->repo->objects->sources; source; source = source->next) { + struct multi_pack_index *m = get_multi_pack_index(source); + if (m) + find_abbrev_len_for_midx(m, mad); + } + for (p = get_packed_git(mad->repo); p; p = p->next) find_abbrev_len_for_pack(p, mad); } diff --git a/pack-bitmap.c b/pack-bitmap.c index 0a4af199c05776..64278e2acf7604 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -691,13 +691,15 @@ static int open_pack_bitmap(struct repository *r, static int open_midx_bitmap(struct repository *r, struct bitmap_index *bitmap_git) { + struct odb_source *source; int ret = -1; - struct multi_pack_index *midx; assert(!bitmap_git->map); - for (midx = get_multi_pack_index(r); midx; midx = midx->next) { - if (!open_midx_bitmap_1(bitmap_git, midx)) + odb_prepare_alternates(r->objects); + for (source = r->objects->sources; source; source = source->next) { + struct multi_pack_index *midx = get_multi_pack_index(source); + if (midx && !open_midx_bitmap_1(bitmap_git, midx)) ret = 0; } return ret; @@ -3305,11 +3307,18 @@ static int verify_bitmap_file(const struct git_hash_algo *algop, int verify_bitmap_files(struct repository *r) { + struct odb_source *source; int res = 0; - for (struct multi_pack_index *m = get_multi_pack_index(r); - m; m = m->next) { - char *midx_bitmap_name = midx_bitmap_filename(m); + odb_prepare_alternates(r->objects); + for (source = r->objects->sources; source; source = source->next) { + struct multi_pack_index *m = get_multi_pack_index(source); + char *midx_bitmap_name; + + if (!m) + continue; + + midx_bitmap_name = midx_bitmap_filename(m); res |= verify_bitmap_file(r->hash_algo, midx_bitmap_name); free(midx_bitmap_name); } diff --git a/packfile.c b/packfile.c index 7b350f018ca78c..d0f38a020354ab 100644 --- a/packfile.c +++ b/packfile.c @@ -963,14 +963,18 @@ static void prepare_packed_git(struct repository *r); unsigned long repo_approximate_object_count(struct repository *r) { if (!r->objects->approximate_object_count_valid) { - unsigned long count; - struct multi_pack_index *m; + struct odb_source *source; + unsigned long count = 0; struct packed_git *p; prepare_packed_git(r); - count = 0; - for (m = get_multi_pack_index(r); m; m = m->next) - count += m->num_objects; + + for (source = r->objects->sources; source; source = source->next) { + struct multi_pack_index *m = get_multi_pack_index(source); + if (m) + count += m->num_objects; + } + for (p = r->objects->packed_git; p; p = p->next) { if (open_pack_index(p)) continue; @@ -1074,21 +1078,10 @@ struct packed_git *get_packed_git(struct repository *r) return r->objects->packed_git; } -struct multi_pack_index *get_multi_pack_index(struct repository *r) -{ - prepare_packed_git(r); - return r->objects->multi_pack_index; -} - -struct multi_pack_index *get_local_multi_pack_index(struct repository *r) +struct multi_pack_index *get_multi_pack_index(struct odb_source *source) { - struct multi_pack_index *m = get_multi_pack_index(r); - - /* no need to iterate; we always put the local one first (if any) */ - if (m && m->local) - return m; - - return NULL; + prepare_packed_git(source->odb->repo); + return source->midx; } struct packed_git *get_all_packs(struct repository *r) diff --git a/packfile.h b/packfile.h index 53c3b7d3b43cba..f16753f2a9bb4c 100644 --- a/packfile.h +++ b/packfile.h @@ -147,8 +147,7 @@ void install_packed_git(struct repository *r, struct packed_git *pack); struct packed_git *get_packed_git(struct repository *r); struct list_head *get_packed_git_mru(struct repository *r); -struct multi_pack_index *get_multi_pack_index(struct repository *r); -struct multi_pack_index *get_local_multi_pack_index(struct repository *r); +struct multi_pack_index *get_multi_pack_index(struct odb_source *source); struct packed_git *get_all_packs(struct repository *r); /* From 7fc19983926af6aea1eb393a5deb7bb2fc3cd495 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Tue, 15 Jul 2025 13:29:22 +0200 Subject: [PATCH 11/19] packfile: stop using linked MIDX list in `find_pack_entry()` Refactor `find_pack_entry()` so that we stop using the linked list of multi-pack indices. Note that there is no need to explicitly prepare alternates, and neither do we have to use `get_multi_pack_index()`, because `prepare_packed_git()` already takes care of populating all data structures for us. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- packfile.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/packfile.c b/packfile.c index d0f38a020354ab..2d19c53ea96a7e 100644 --- a/packfile.c +++ b/packfile.c @@ -2074,16 +2074,15 @@ static int fill_pack_entry(const struct object_id *oid, int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e) { struct list_head *pos; - struct multi_pack_index *m; prepare_packed_git(r); - if (!r->objects->packed_git && !r->objects->multi_pack_index) - return 0; - for (m = r->objects->multi_pack_index; m; m = m->next) { - if (fill_midx_entry(r, oid, e, m)) + for (struct odb_source *source = r->objects->sources; source; source = source->next) + if (source->midx && fill_midx_entry(r, oid, e, source->midx)) return 1; - } + + if (!r->objects->packed_git) + return 0; list_for_each(pos, &r->objects->packed_git_mru) { struct packed_git *p = list_entry(pos, struct packed_git, mru); From c620586fccf5a62e36a2d6cc96d0427f93f123fc Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Tue, 15 Jul 2025 13:29:23 +0200 Subject: [PATCH 12/19] packfile: stop using linked MIDX list in `get_all_packs()` Refactor `get_all_packs()` so that we stop using the linked list of multi-pack indices. Note that there is no need to explicitly prepare alternates, and neither do we have to use `get_multi_pack_index()`, because `prepare_packed_git()` already takes care of populating all data structures for us. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- packfile.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/packfile.c b/packfile.c index 2d19c53ea96a7e..ff33692f4b5996 100644 --- a/packfile.c +++ b/packfile.c @@ -1086,12 +1086,13 @@ struct multi_pack_index *get_multi_pack_index(struct odb_source *source) struct packed_git *get_all_packs(struct repository *r) { - struct multi_pack_index *m; - prepare_packed_git(r); - for (m = r->objects->multi_pack_index; m; m = m->next) { - uint32_t i; - for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) + + for (struct odb_source *source = r->objects->sources; source; source = source->next) { + struct multi_pack_index *m = source->midx; + if (!m) + continue; + for (uint32_t i = 0; i < m->num_packs + m->num_packs_in_base; i++) prepare_midx_pack(r, m, i); } From ec865d94d4615c00fbf9ac50f4274b1d3fbf73a6 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Tue, 15 Jul 2025 13:29:24 +0200 Subject: [PATCH 13/19] midx: remove now-unused linked list of multi-pack indices In the preceding commits we have migrated all users of the linked list of multi-pack indices to instead use those stored in the object database sources. Remove those now-unused pointers. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano --- midx.c | 18 ++---------------- midx.h | 2 -- odb.h | 7 ------- packfile.c | 1 - 4 files changed, 2 insertions(+), 26 deletions(-) diff --git a/midx.c b/midx.c index 472d6bf17ab922..7d407682e60a6f 100644 --- a/midx.c +++ b/midx.c @@ -726,7 +726,6 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id) int prepare_multi_pack_index_one(struct odb_source *source, int local) { struct repository *r = source->odb->repo; - struct multi_pack_index *m; prepare_repo_settings(r); if (!r->settings.core_multi_pack_index) @@ -735,21 +734,9 @@ int prepare_multi_pack_index_one(struct odb_source *source, int local) if (source->midx) return 1; - m = load_multi_pack_index(r, source->path, local); - if (m) { - struct multi_pack_index *mp = r->objects->multi_pack_index; - if (mp) { - m->next = mp->next; - mp->next = m; - } else { - r->objects->multi_pack_index = m; - } - source->midx = m; + source->midx = load_multi_pack_index(r, source->path, local); - return 1; - } - - return 0; + return !!source->midx; } int midx_checksum_valid(struct multi_pack_index *m) @@ -842,7 +829,6 @@ void clear_midx_file(struct repository *r) close_midx(source->midx); source->midx = NULL; } - r->objects->multi_pack_index = NULL; } if (remove_path(midx.buf)) diff --git a/midx.h b/midx.h index 639a6f50e45d95..076382de8acd26 100644 --- a/midx.h +++ b/midx.h @@ -35,8 +35,6 @@ struct odb_source; "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL" struct multi_pack_index { - struct multi_pack_index *next; - const unsigned char *data; size_t data_len; diff --git a/odb.h b/odb.h index f09dba1fe1dd3a..09177bf430dc38 100644 --- a/odb.h +++ b/odb.h @@ -123,13 +123,6 @@ struct object_database { struct commit_graph *commit_graph; unsigned commit_graph_attempted : 1; /* if loading has been attempted */ - /* - * private data - * - * should only be accessed directly by packfile.c and midx.c - */ - struct multi_pack_index *multi_pack_index; - /* * private data * diff --git a/packfile.c b/packfile.c index ff33692f4b5996..5d73932f50ce68 100644 --- a/packfile.c +++ b/packfile.c @@ -375,7 +375,6 @@ void close_object_store(struct object_database *o) close_midx(source->midx); source->midx = NULL; } - o->multi_pack_index = NULL; close_commit_graph(o); } From 8cc19250b324c90f4283bcad488c8bbc756145c4 Mon Sep 17 00:00:00 2001 From: Hoyoung Lee Date: Tue, 22 Jul 2025 17:41:01 +0000 Subject: [PATCH 14/19] t/helper/test-truncate: close file descriptor after truncation Fix a resource leak where the file descriptor was not closed after truncating a file in t/helper/test-truncate.c. Signed-off-by: Hoyoung Lee Signed-off-by: Junio C Hamano --- t/helper/test-truncate.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/t/helper/test-truncate.c b/t/helper/test-truncate.c index 3931deaec7dcda..2820cc7ed701cd 100644 --- a/t/helper/test-truncate.c +++ b/t/helper/test-truncate.c @@ -21,5 +21,8 @@ int cmd__truncate(int argc, const char **argv) if (ftruncate(fd, (off_t) sz) < 0) die_errno("failed to truncate file"); + + close(fd); + return 0; } From bc235a68c87d92dc15e2d656a004e9b20042405f Mon Sep 17 00:00:00 2001 From: Jeff King Date: Wed, 23 Jul 2025 20:00:56 -0400 Subject: [PATCH 15/19] test-delta: handle errors with die() This is a short test helper that does all of its work in the main function. When we encounter an error, we try to clean up memory and descriptors and then jump to an error return, which exits the program. We can get the same effect by just calling die(), which means we do not have to bother with cleaning up. This simplifies the code, and also removes some inconsistencies where a few code paths forgot to clean up descriptors (though in practice it was not a big deal since we were exiting anyway). In addition to die() and die_errno(), we'll also use a few of our usual helpers like xopen() and usage() that make things more ergonomic. This does change the exit code in these cases from 1 to 128, but I don't think it matters (and arguably is better, as we'd already exit 128 for other errors like xmalloc() failure). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/helper/test-delta.c | 55 ++++++++++++++----------------------------- 1 file changed, 18 insertions(+), 37 deletions(-) diff --git a/t/helper/test-delta.c b/t/helper/test-delta.c index 6bc787a4743be0..4495b32b491983 100644 --- a/t/helper/test-delta.c +++ b/t/helper/test-delta.c @@ -21,39 +21,26 @@ int cmd__delta(int argc, const char **argv) struct stat st; void *from_buf = NULL, *data_buf = NULL, *out_buf = NULL; unsigned long from_size, data_size, out_size; - int ret = 1; - if (argc != 5 || (strcmp(argv[1], "-d") && strcmp(argv[1], "-p"))) { - fprintf(stderr, "usage: %s\n", usage_str); - return 1; - } + if (argc != 5 || (strcmp(argv[1], "-d") && strcmp(argv[1], "-p"))) + usage(usage_str); - fd = open(argv[2], O_RDONLY); - if (fd < 0 || fstat(fd, &st)) { - perror(argv[2]); - return 1; - } + fd = xopen(argv[2], O_RDONLY); + if (fstat(fd, &st) < 0) + die_errno("fstat(%s)", argv[2]); from_size = st.st_size; from_buf = xmalloc(from_size); - if (read_in_full(fd, from_buf, from_size) < 0) { - perror(argv[2]); - close(fd); - goto cleanup; - } + if (read_in_full(fd, from_buf, from_size) < 0) + die_errno("read(%s)", argv[2]); close(fd); - fd = open(argv[3], O_RDONLY); - if (fd < 0 || fstat(fd, &st)) { - perror(argv[3]); - goto cleanup; - } + fd = xopen(argv[3], O_RDONLY); + if (fstat(fd, &st) < 0) + die_errno("fstat(%s)", argv[3]); data_size = st.st_size; data_buf = xmalloc(data_size); - if (read_in_full(fd, data_buf, data_size) < 0) { - perror(argv[3]); - close(fd); - goto cleanup; - } + if (read_in_full(fd, data_buf, data_size) < 0) + die_errno("read(%s)", argv[3]); close(fd); if (argv[1][1] == 'd') @@ -64,22 +51,16 @@ int cmd__delta(int argc, const char **argv) out_buf = patch_delta(from_buf, from_size, data_buf, data_size, &out_size); - if (!out_buf) { - fprintf(stderr, "delta operation failed (returned NULL)\n"); - goto cleanup; - } + if (!out_buf) + die("delta operation failed (returned NULL)"); - fd = open (argv[4], O_WRONLY|O_CREAT|O_TRUNC, 0666); - if (fd < 0 || write_in_full(fd, out_buf, out_size) < 0) { - perror(argv[4]); - goto cleanup; - } + fd = xopen(argv[4], O_WRONLY|O_CREAT|O_TRUNC, 0666); + if (write_in_full(fd, out_buf, out_size) < 0) + die_errno("write(%s)", argv[4]); - ret = 0; -cleanup: free(from_buf); free(data_buf); free(out_buf); - return ret; + return 0; } From 760dd804bb40b613396d25a0905db7fe4a52b00f Mon Sep 17 00:00:00 2001 From: Jeff King Date: Wed, 23 Jul 2025 20:02:11 -0400 Subject: [PATCH 16/19] test-delta: use strbufs to hold input files We want to read the whole contents of two files into memory. If we switch from raw ptr/len pairs to strbufs, we can use strbuf_read_file() to shorten the code. This incidentally fixes two small bugs: 1. We stat() the files and allocate our buffers based on st.st_size. But that is an off_t which may be larger than the size_t we'd use to allocate. We should use xsize_t() to do a checked conversion. Otherwise integer truncation (on a file >4GB) could cause us to under-allocate (though in practice this does not result in a buffer overflow because the same truncation happens when read_in_full() also takes a size_t). 2. We get the size from st.st_size, and then try to read_in_full() that many bytes. But it may return fewer bytes than expected (if the file changed racily and we get an early EOF), leading us to read uninitialized bytes in the allocated buffer. We don't notice because we only check the value for error, not that we got the expected number of bytes. The strbuf code doesn't run into this, because it just reads to EOF, expanding the buffer dynamically as necessary. Neither bug is a big deal for a test helper, but fixing them is a nice bonus on top of simplifying the code. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/helper/test-delta.c | 40 ++++++++++++++-------------------------- 1 file changed, 14 insertions(+), 26 deletions(-) diff --git a/t/helper/test-delta.c b/t/helper/test-delta.c index 4495b32b491983..7945793078c4c7 100644 --- a/t/helper/test-delta.c +++ b/t/helper/test-delta.c @@ -11,6 +11,7 @@ #include "test-tool.h" #include "git-compat-util.h" #include "delta.h" +#include "strbuf.h" static const char usage_str[] = "test-tool delta (-d|-p) "; @@ -18,38 +19,25 @@ static const char usage_str[] = int cmd__delta(int argc, const char **argv) { int fd; - struct stat st; - void *from_buf = NULL, *data_buf = NULL, *out_buf = NULL; - unsigned long from_size, data_size, out_size; + struct strbuf from = STRBUF_INIT, data = STRBUF_INIT; + char *out_buf; + unsigned long out_size; if (argc != 5 || (strcmp(argv[1], "-d") && strcmp(argv[1], "-p"))) usage(usage_str); - fd = xopen(argv[2], O_RDONLY); - if (fstat(fd, &st) < 0) - die_errno("fstat(%s)", argv[2]); - from_size = st.st_size; - from_buf = xmalloc(from_size); - if (read_in_full(fd, from_buf, from_size) < 0) - die_errno("read(%s)", argv[2]); - close(fd); - - fd = xopen(argv[3], O_RDONLY); - if (fstat(fd, &st) < 0) - die_errno("fstat(%s)", argv[3]); - data_size = st.st_size; - data_buf = xmalloc(data_size); - if (read_in_full(fd, data_buf, data_size) < 0) - die_errno("read(%s)", argv[3]); - close(fd); + if (strbuf_read_file(&from, argv[2], 0) < 0) + die_errno("unable to read '%s'", argv[2]); + if (strbuf_read_file(&data, argv[3], 0) < 0) + die_errno("unable to read '%s'", argv[3]); if (argv[1][1] == 'd') - out_buf = diff_delta(from_buf, from_size, - data_buf, data_size, + out_buf = diff_delta(from.buf, from.len, + data.buf, data.len, &out_size, 0); else - out_buf = patch_delta(from_buf, from_size, - data_buf, data_size, + out_buf = patch_delta(from.buf, from.len, + data.buf, data.len, &out_size); if (!out_buf) die("delta operation failed (returned NULL)"); @@ -58,8 +46,8 @@ int cmd__delta(int argc, const char **argv) if (write_in_full(fd, out_buf, out_size) < 0) die_errno("write(%s)", argv[4]); - free(from_buf); - free(data_buf); + strbuf_release(&from); + strbuf_release(&data); free(out_buf); return 0; From 0f1b33815b553dd457f8e38e3768b73cf9227082 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Wed, 23 Jul 2025 20:03:08 -0400 Subject: [PATCH 17/19] test-delta: close output descriptor after use After we write to the output file, the program exits. This naturally closes the descriptor. But we should do an explicit close for two reasons: 1. It's possible to hit an error on close(), which we should detect and report via our exit code. 2. Leaking descriptors is a bad practice in general. Even if it isn't meaningful here, it sets a bad example. It is tempting to write: if (write_in_full(fd, ...) < 0 || close(fd) < 0) die_errno(...); But that pattern contains a subtle problem that has resulted in descriptor leaks before. If write_in_full() fails, we'll short-circuit and never call close(), leaking the descriptor. That's not a problem here, since our error path dies instead of returning up the stack. But since we're trying to set a good example, let's write it out as two separate conditions. As a bonus, that lets us produce a slightly more specific error message. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/helper/test-delta.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/t/helper/test-delta.c b/t/helper/test-delta.c index 7945793078c4c7..52ea00c93718c7 100644 --- a/t/helper/test-delta.c +++ b/t/helper/test-delta.c @@ -45,6 +45,8 @@ int cmd__delta(int argc, const char **argv) fd = xopen(argv[4], O_WRONLY|O_CREAT|O_TRUNC, 0666); if (write_in_full(fd, out_buf, out_size) < 0) die_errno("write(%s)", argv[4]); + if (close(fd) < 0) + die_errno("close(%s)", argv[4]); strbuf_release(&from); strbuf_release(&data); From 9201261a70b5d325b344036de15901a4064d66c0 Mon Sep 17 00:00:00 2001 From: Karthik Nayak Date: Fri, 25 Jul 2025 00:11:36 +0200 Subject: [PATCH 18/19] ref-cache: set prefix_state when seeking In 090eb5336c (refs: selectively set prefix in the seek functions, 2025-07-15) we separated the seeking functionality of reference iterators from the functionality to set prefix to an iterator. This allows users of ref iterators to seek to a particular reference to provide pagination support. The files-backend, uses the ref-cache iterator to iterate over loose refs. The iterator tracks directories and entries already processed via a stack of levels. Each level corresponds to a directory under the files backend. New levels are added to the stack, and when all entries from a level is yielded, the corresponding level is popped from the stack. To accommodate seeking, we need to populate and traverse the levels to stop the requested seek marker at the appropriate level and its entry index. Each level also contains a 'prefix_state' which is used for prefix matching, this allows the iterator to skip levels/entries which don't match a prefix. The default value of 'prefix_state' is PREFIX_CONTAINS_DIR, which yields all entries within a level. When purely seeking without prefix matching, we want to yield all entries. The commit however, skips setting the value explicitly. This causes the MemorySanitizer to issue a 'use-of-uninitialized-value' error when running 't/t6302-for-each-ref-filter'. Set the value explicitly to avoid to fix the issue. Reported-by: Kyle Lippincott Helped-by: Kyle Lippincott Helped-by: Jeff King Signed-off-by: Karthik Nayak Signed-off-by: Junio C Hamano --- refs/ref-cache.c | 1 + 1 file changed, 1 insertion(+) diff --git a/refs/ref-cache.c b/refs/ref-cache.c index 1d95b56d4000bc..ceef3a20088267 100644 --- a/refs/ref-cache.c +++ b/refs/ref-cache.c @@ -527,6 +527,7 @@ static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator, level = &iter->levels[iter->levels_nr++]; level->dir = dir; level->index = -1; + level->prefix_state = PREFIX_CONTAINS_DIR; } else { /* reduce the index so the leaf node is iterated over */ if (cmp <= 0 && !slash) From e0753259271b76f6e53b3b170b4bc08cca793bca Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Sun, 3 Aug 2025 18:44:07 -0700 Subject: [PATCH 19/19] The seventeenth batch, just before -rc0 Signed-off-by: Junio C Hamano --- Documentation/RelNotes/2.51.0.adoc | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/Documentation/RelNotes/2.51.0.adoc b/Documentation/RelNotes/2.51.0.adoc index 5e254541b5f182..78b4918533f29e 100644 --- a/Documentation/RelNotes/2.51.0.adoc +++ b/Documentation/RelNotes/2.51.0.adoc @@ -61,6 +61,9 @@ UI, Workflows & Features * "git pull" learned to pay attention to pull.autostash configuration variable, which overrides rebase/merge.autostash. + * "git for-each-ref" learns "--start-after" option to help + applications that want to page its output. + Performance, Internal Implementation, Development Support etc. -------------------------------------------------------------- @@ -125,6 +128,11 @@ Performance, Internal Implementation, Development Support etc. * The reftable unit tests are now ported to the "clar" unit testing framework. + * Redefine where the multi-pack-index sits in the object subsystem, + which recently was restructured to allow multiple backends that + support a single object source that belongs to one repository. A + midx does span mulitple "object sources". + Fixes since v2.50 ----------------- @@ -252,6 +260,15 @@ including security updates, are included in this release. core.commentstring is set to 'auto'. (merge 92b7c7c9f5 ac/auto-comment-char-fix later to maint). + * "git rebase -i" with bogus rebase.instructionFormat configuration + failed to produce the todo file after recording the state files, + leading to confused "git status"; this has been corrected. + (merge ade14bffd7 ow/rebase-verify-insn-fmt-before-initializing-state later to maint). + + * A few file descriptors left unclosed upon program completion in a + few test helper programs are now closed. + (merge 0f1b33815b hl/test-helper-fd-close later to maint). + * Other code cleanup, docfix, build fix, etc. (merge b257adb571 lo/my-first-ow-doc-update later to maint). (merge 8b34b6a220 ly/sequencer-update-squash-is-fixup-only later to maint).