Skip to content

[pull] master from git:master#85

Merged
pull[bot] merged 25 commits intoturkdevops:masterfrom
git:master
Aug 4, 2025
Merged

[pull] master from git:master#85
pull[bot] merged 25 commits intoturkdevops:masterfrom
git:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Aug 4, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.3)

Can you help keep this open source service alive? 💖 Please sponsor : )

Osse and others added 25 commits June 9, 2025 15:56
If rebase.instructionFormat is invalid the repository is left in a
strange state when the interactive rebase fails. `git status` outputs
boths the same as it would in the normal case *and* something related to
interactive rebase:

    $ git -c rebase.instructionFormat=blah rebase -i
    fatal: invalid --pretty format: blah
    $ git status
    On branch master
    Your branch is ahead of 'upstream/master' by 1 commit.
      (use "git push" to publish your local commits)

    git-rebase-todo is missing.
    No commands done.
    No commands remaining.
    You are currently editing a commit while rebasing branch 'master' on '8db3019401'.
      (use "git commit --amend" to amend the current commit)
      (use "git rebase --continue" once you are satisfied with your changes)

By attempting to write the rebase script before initializing the state
this potential scenario is avoided.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-store:
  odb: rename `read_object_with_reference()`
  odb: rename `pretend_object_file()`
  odb: rename `has_object()`
  odb: rename `repo_read_object_file()`
  odb: rename `oid_object_info()`
  odb: trivial refactorings to get rid of `the_repository`
  odb: get rid of `the_repository` when handling submodule sources
  odb: get rid of `the_repository` when handling the primary source
  odb: get rid of `the_repository` in `for_each()` functions
  odb: get rid of `the_repository` when handling alternates
  odb: get rid of `the_repository` in `odb_mkstemp()`
  odb: get rid of `the_repository` in `assert_oid_type()`
  odb: get rid of `the_repository` in `find_odb()`
  odb: introduce parent pointers
  object-store: rename files to "odb.{c,h}"
  object-store: rename `object_directory` to `odb_source`
  object-store: rename `raw_object_store` to `object_database`
The `ref_iterator` is an internal structure to the 'refs/'
sub-directory, which allows iteration over refs. All reference iteration
is built on top of these iterators.

External clients of the 'refs' subsystem use the various
'refs_for_each...()' functions to iterate over refs. However since these
are wrapper functions, each combination of functionality requires a new
wrapper function. This is not feasible as the functions pile up with the
increase in requirements. Expose the internal reference iterator, so
advanced users can mix and match options as needed.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'find_ref_entry' function is no longer used, so remove it.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref iterator exposes a `ref_iterator_seek()` function. The name
suggests that this would seek the iterator to a specific reference in
some ways similar to how `fseek()` works for the filesystem.

However, the function actually sets the prefix for refs iteration. So
further iteration would only yield references which match the particular
prefix. This is a bit confusing.

Let's add a 'flags' field to the function, which when set with the
'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the
iteration in-line with the existing behavior. Otherwise, the reference
backends will simply seek to the specified reference and clears any
previously set prefix. This allows users to start iteration from a
specific reference.

In the packed and reftable backend, since references are available in a
sorted list, the changes are simply setting the prefix if needed. The
changes on the files-backend are a little more involved, since the files
backend uses the 'ref-cache' mechanism. We move out the existing logic
within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We
then parse the provided seek string and set the required levels and
their indexes to ensure that seeking is possible.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 'ref-filter.c', there is an 'else' clause within `do_filter_refs()`.
This is unnecessary since the 'if' clause calls `die()`, which would
exit the program. So let's remove the unnecessary 'else' clause. This
improves readability since the indentation is also reduced and flow is
simpler.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.

The previous commit added 'seek' functionality to the reference
backends. Utilize this and expose a '--start-after' option in
'git-for-each-ref(1)'. When used, the reference iteration seeks to the
lexicographically next reference and iterates from there onward.

This enables efficient pagination workflows, where the calling script
can remember the last provided reference and use that as the starting
point for the next set of references:
    git for-each-ref --count=100
    git for-each-ref --count=100 --start-after=refs/heads/branch-100
    git for-each-ref --count=100 --start-after=refs/heads/branch-200

Since the reference iterators only allow seeking to a specified marker
via the `ref_iterator_seek()`, we introduce a helper function
`start_ref_iterator_after()`, which seeks to next reference by simply
adding (char) 1 to the marker.

We must note that pagination always continues from the provided marker,
as such any concurrent reference updates lexicographically behind the
marker will not be output. Document the same.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/midx-avoid-cruft-packs:
  repack: exclude cruft pack(s) from the MIDX where possible
  pack-objects: introduce '--stdin-packs=follow'
  pack-objects: swap 'show_{object,commit}_pack_hint'
  pack-objects: fix typo in 'show_object_pack_hint()'
  pack-objects: perform name-hash traversal for unpacked objects
  pack-objects: declare 'rev_info' for '--stdin-packs' earlier
  pack-objects: factor out handling '--stdin-packs'
  pack-objects: limit scope in 'add_object_entry_from_pack()'
  pack-objects: use standard option incompatibility functions
Multi-pack indices are tracked via `struct multi_pack_index`. This data
structure is stored as a linked list inside `struct object_database`,
which is the global database that spans across all of the object
sources.

This layout causes two problems:

  - Object databases consist of multiple object sources (e.g. one source
    per alternate object directory), where each multi-pack index is
    specific to one of those sources. Regardless of that though, the
    MIDX is not tracked per source, but tracked globally for the whole
    object database. This creates a mismatch between the on-disk layout
    and how things are organized in the object database subsystems and
    makes some parts, like figuring out whether a source has an MIDX,
    quite awkward.

  - Multi-pack indices are an implementation detail of how efficient
    access for packfiles work. As such, they are neither relevant in the
    context of loose objects, nor in a potential future where we have
    pluggable backends.

Refactor `prepare_multi_pack_index_one()` so that it works on a specific
source, which allows us to easily store a pointer to the multi-pack
index inside of it. For now, this pointer exists next to the existing
linked list we have in the object database. Users will be adjusted in
subsequent patches to instead use the per-source pointers.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit we refactored how we load multi-pack indices to
take a corresponding "source" as input. As part of this refactoring we
started to store a pointer to the MIDX in `struct odb_source` itself.

Refactor loading of packfiles in the same way: instead of passing in the
object directory, we now pass in the source from which we want to load
packfiles. This allows us to simplify the code because we don't have to
search for a corresponding MIDX anymore, but we can instead directly use
the MIDX that we have already prepared beforehand.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling `close_midx()` we not only close the multi-pack index for
one object source, but instead we iterate through the whole linked list
of MIDXs to close all of them. This linked list is about to go away in
favor of using the new per-source pointer to its respective MIDX.

Refactor the function to iterate through sources instead.

Note that after this patch, there's a couple of callsites left that
continue to use `close_midx()` without iterating through all sources.
These are all cases where we don't care about the MIDX from other
sources though, so it's fine to keep them as-is.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `get_multi_pack_index()` loads multi-pack indices via
`prepare_packed_git()` and then returns the linked list of multi-pack
indices that is stored in `struct object_database`. That list is in the
process of being removed though in favor of storing the MIDX as part of
the object database source it belongs to.

Refactor `get_multi_pack_index()` so that it returns the multi-pack
index for a single object source. Callers are now expected to call this
function for each source they are interested in. This requires them to
iterate through alternates, so we have to prepare alternate object
sources before doing so.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor `find_pack_entry()` so that we stop using the linked list of
multi-pack indices. Note that there is no need to explicitly prepare
alternates, and neither do we have to use `get_multi_pack_index()`,
because `prepare_packed_git()` already takes care of populating all data
structures for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor `get_all_packs()` so that we stop using the linked list of
multi-pack indices. Note that there is no need to explicitly prepare
alternates, and neither do we have to use `get_multi_pack_index()`,
because `prepare_packed_git()` already takes care of populating all data
structures for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have migrated all users of the linked list
of multi-pack indices to instead use those stored in the object database
sources. Remove those now-unused pointers.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a resource leak where the file descriptor was not closed after
truncating a file in t/helper/test-truncate.c.

Signed-off-by: Hoyoung Lee <lhywkd22@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a short test helper that does all of its work in the main
function. When we encounter an error, we try to clean up memory and
descriptors and then jump to an error return, which exits the program.

We can get the same effect by just calling die(), which means we do not
have to bother with cleaning up. This simplifies the code, and also
removes some inconsistencies where a few code paths forgot to clean up
descriptors (though in practice it was not a big deal since we were
exiting anyway).

In addition to die() and die_errno(), we'll also use a few of our usual
helpers like xopen() and usage() that make things more ergonomic.

This does change the exit code in these cases from 1 to 128, but I
don't think it matters (and arguably is better, as we'd already exit 128
for other errors like xmalloc() failure).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to read the whole contents of two files into memory. If we
switch from raw ptr/len pairs to strbufs, we can use strbuf_read_file()
to shorten the code.

This incidentally fixes two small bugs:

  1. We stat() the files and allocate our buffers based on st.st_size.
     But that is an off_t which may be larger than the size_t we'd use
     to allocate. We should use xsize_t() to do a checked conversion.
     Otherwise integer truncation (on a file >4GB) could cause us to
     under-allocate (though in practice this does not result in a buffer
     overflow because the same truncation happens when read_in_full()
     also takes a size_t).

  2. We get the size from st.st_size, and then try to read_in_full()
     that many bytes. But it may return fewer bytes than expected (if
     the file changed racily and we get an early EOF), leading us to
     read uninitialized bytes in the allocated buffer. We don't notice
     because we only check the value for error, not that we got the
     expected number of bytes.

The strbuf code doesn't run into this, because it just reads to EOF,
expanding the buffer dynamically as necessary. Neither bug is a big deal
for a test helper, but fixing them is a nice bonus on top of simplifying
the code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After we write to the output file, the program exits. This naturally
closes the descriptor. But we should do an explicit close for two
reasons:

  1. It's possible to hit an error on close(), which we should detect
     and report via our exit code.

  2. Leaking descriptors is a bad practice in general. Even if it isn't
     meaningful here, it sets a bad example.

It is tempting to write:

  if (write_in_full(fd, ...) < 0 || close(fd) < 0)
        die_errno(...);

But that pattern contains a subtle problem that has resulted in
descriptor leaks before. If write_in_full() fails, we'll short-circuit
and never call close(), leaking the descriptor.

That's not a problem here, since our error path dies instead of
returning up the stack. But since we're trying to set a good example,
let's write it out as two separate conditions. As a bonus, that lets us
produce a slightly more specific error message.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 090eb5336c (refs: selectively set prefix in the seek functions,
2025-07-15) we separated the seeking functionality of reference
iterators from the functionality to set prefix to an iterator. This
allows users of ref iterators to seek to a particular reference to
provide pagination support.

The files-backend, uses the ref-cache iterator to iterate over loose
refs. The iterator tracks directories and entries already processed via
a stack of levels. Each level corresponds to a directory under the files
backend. New levels are added to the stack, and when all entries from a
level is yielded, the corresponding level is popped from the stack.

To accommodate seeking, we need to populate and traverse the levels to
stop the requested seek marker at the appropriate level and its entry
index. Each level also contains a 'prefix_state' which is used for
prefix matching, this allows the iterator to skip levels/entries which
don't match a prefix. The default value of 'prefix_state' is
PREFIX_CONTAINS_DIR, which yields all entries within a level. When
purely seeking without prefix matching, we want to yield all entries.
The commit however, skips setting the value explicitly. This causes the
MemorySanitizer to issue a 'use-of-uninitialized-value' error when
running 't/t6302-for-each-ref-filter'.

Set the value explicitly to avoid to fix the issue.

Reported-by: Kyle Lippincott <spectral@google.com>
Helped-by: Kyle Lippincott <spectral@google.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git for-each-ref" learns "--start-after" option to help
applications that want to page its output.

* kn/for-each-ref-skip:
  ref-cache: set prefix_state when seeking
  for-each-ref: introduce a '--start-after' option
  ref-filter: remove unnecessary else clause
  refs: selectively set prefix in the seek functions
  ref-cache: remove unused function 'find_ref_entry()'
  refs: expose `ref_iterator` via 'refs.h'
Redefine where the multi-pack-index sits in the object subsystem,
which recently was restructured to allow multiple backends that
support a single object source that belongs to one repository.  A
midx does span mulitple "object sources".

* ps/object-store-midx:
  midx: remove now-unused linked list of multi-pack indices
  packfile: stop using linked MIDX list in `get_all_packs()`
  packfile: stop using linked MIDX list in `find_pack_entry()`
  packfile: refactor `get_multi_pack_index()` to work on sources
  midx: stop using linked list when closing MIDX
  packfile: refactor `prepare_packed_git_one()` to work on sources
  midx: start tracking per object database source
"git rebase -i" with bogus rebase.instructionFormat configuration
failed to produce the todo file after recording the state files,
leading to confused "git status"; this has been corrected.

* ow/rebase-verify-insn-fmt-before-initializing-state:
  rebase: write script before initializing state
A few file descriptors left unclosed upon program completion in a
few test helper programs are now closed.

* hl/test-helper-fd-close:
  test-delta: close output descriptor after use
  test-delta: use strbufs to hold input files
  test-delta: handle errors with die()
  t/helper/test-truncate: close file descriptor after truncation
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@pull pull bot locked and limited conversation to collaborators Aug 4, 2025
@pull pull bot added the ⤵️ pull label Aug 4, 2025
@pull pull bot merged commit e075325 into turkdevops:master Aug 4, 2025
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants